-
公开(公告)号:US20140309995A1
公开(公告)日:2014-10-16
申请号:US14317873
申请日:2014-06-27
发明人: Kjell Schubert , Juergen Fritsch , Michael Finke , Detlef Koll
IPC分类号: G10L15/26
CPC分类号: G10L15/26 , G06F17/273 , G06F17/2785 , G10L15/1807 , G10L15/22 , G10L15/265 , G10L21/04
摘要: Techniques are disclosed for facilitating the process of proofreading draft transcripts of spoken audio streams. In general, proofreading of a draft transcript is facilitated by playing back the corresponding spoken audio stream with an emphasis on those regions in the audio stream that are highly relevant or likely to have been transcribed incorrectly. Regions may be emphasized by, for example, playing them back more slowly than regions that are of low relevance and likely to have been transcribed correctly. Emphasizing those regions of the audio stream that are most important to transcribe correctly and those regions that are most likely to have been transcribed incorrectly increases the likelihood that the proofreader will accurately correct any errors in those regions, thereby improving the overall accuracy of the transcript.
摘要翻译: 公开了用于促进校对口头音频流的草稿的过程的技术。 一般来说,通过播放对应的口语音频流,强调音频流中与那些高度相关或可能被错误地转录的那些区域,来校对草稿。 例如,区域可能会被强调为比相关程度低且可能被正确转录的地区的播放速度更慢。 强调音频流中最重要的那些区域是正确转录的,那些最有可能被错误转录的区域增加了校对者准确地纠正这些区域中的任何错误的可能性,从而提高了抄本的整体准确性。
-
公开(公告)号:US20150095025A1
公开(公告)日:2015-04-02
申请号:US14571697
申请日:2014-12-16
发明人: Juergen Fritsch , Anoop Deoras , Detlef Koll
CPC分类号: G10L19/0018 , G10L15/063 , G10L15/197 , G10L15/26 , G10L15/265
摘要: Non-verbalized tokens, such as punctuation, are automatically predicted and inserted into a transcription of speech in which the tokens were not explicitly verbalized. Token prediction may be integrated with speech decoding, rather than performed as a post-process to speech decoding.
摘要翻译: 自动预测非言语标记,例如标点符号,并将其插入到言语的转录中,其中令牌不被明确地言语化。 令牌预测可以与语音解码集成,而不是作为语音解码的后处理来执行。
-
公开(公告)号:US20130166297A1
公开(公告)日:2013-06-27
申请号:US13773928
申请日:2013-02-22
IPC分类号: G10L15/06
CPC分类号: G10L15/063 , G06F17/271 , G06F17/2775 , G06F17/28 , G10L15/02 , G10L15/183 , G10L15/193 , G10L15/26 , G10L2015/0631 , G10L2015/0633
摘要: A system is provided for training an acoustic model for use in speech recognition. In particular, such a system may be used to perform training based on a spoken audio stream and a non-literal transcript of the spoken audio stream. Such a system may identify text in the non-literal transcript which represents concepts having multiple spoken forms. The system may attempt to identify the actual spoken form in the audio stream which produced the corresponding text in the non-literal transcript, and thereby produce a revised transcript which more accurately represents the spoken audio stream. The revised, and more accurate, transcript may be used to train the acoustic model using discriminative training techniques, thereby producing a better acoustic model than that which would be produced using conventional techniques, which perform training based directly on the original non-literal transcript.
摘要翻译: 提供用于训练用于语音识别的声学模型的系统。 特别地,这样的系统可以用于基于口语音频流和口头音频流的非文字转录来执行训练。 这样的系统可以识别表示具有多个口头形式的概念的非文字记录中的文本。 该系统可以尝试在音频流中识别在非文字转录中产生相应文本的音频流中的实际语音形式,从而产生更准确地表示语音音频流的经修改的脚本。 可以使用修改和更准确的抄本来使用辨别性训练技术训练声学模型,从而产生比使用直接基于原始非文字誊本进行训练的常规技术产生的更好的声学模型。
-
公开(公告)号:US20130103400A1
公开(公告)日:2013-04-25
申请号:US13691249
申请日:2012-11-30
IPC分类号: G10L15/26
CPC分类号: G10L15/063 , G10L15/193 , G10L15/26
摘要: A system is provided for training an acoustic model for use in speech recognition. In particular, such a system may be used to perform training based on a spoken audio stream and a non-literal transcript of the spoken audio stream. Such a system my identify text in the non-literal transcript which represents concepts having multiple spoken forms. The system may attempt to identify the actual spoken form in the audio stream which produced the corresponding text in the non-literal transcript, and thereby produce a revised transcript which more accurately represents the spoken audio stream. The revised, and more accurate, transcript may be used to train the acoustic model, thereby producing a better acoustic model than that which would be produced using conventional techniques, which perform training based directly on the original non-literal transcript.
摘要翻译: 提供用于训练用于语音识别的声学模型的系统。 特别地,这样的系统可以用于基于口语音频流和口头音频流的非文字转录来执行训练。 这样的系统可以识别表示具有多个口头形式的概念的非文字记录中的文本。 该系统可以尝试在音频流中识别在非文字转录中产生相应文本的音频流中的实际语音形式,从而产生更准确地表示语音音频流的经修改的脚本。 修改和更准确的誊本可用于训练声学模型,从而产生比使用直接基于原始非文字誊本进行训练的常规技术产生的更好的声学模型。
-
-
-