-
公开(公告)号:US09514747B1
公开(公告)日:2016-12-06
申请号:US14011898
申请日:2013-08-28
Applicant: Amazon Technologies, Inc.
Inventor: Michael Maximilian Emanuel Bisani , Hugh Evan Secker-Walker , Kenneth John Basye , Alexander David Rosen
CPC classification number: G10L15/08 , G10L25/60 , G10L2015/085
Abstract: In an automatic speech recognition (ASR) processing system, ASR processing may be configured to reduce a latency of returning speech results to a user. The latency may be determined by comparing a time stamp of an utterance in process to a current time. Latency may also be estimated based on an endpoint of the utterance or other considerations such as how difficult the utterance may be to process. To improve latency the ASR system may be configured to adjust various processing parameters, such as graph pruning factors, path weights, ASR models, etc. Latency checks and corrections may occur dynamically for a particular utterance while it is being processed, thus allowing the ASR system to adjust to rapidly changing latency conditions.
Abstract translation: 在自动语音识别(ASR)处理系统中,ASR处理可以被配置为减少向用户返回语音结果的等待时间。 可以通过将处理中的话语的时间戳与当前时间进行比较来确定等待时间。 延迟也可以基于话语的终点或其他考虑来估计,例如话语可能难以处理。 为了改善延迟,ASR系统可以被配置为调整各种处理参数,例如图形剪枝因子,路径权重,ASR模型等。在正在处理的情况下,潜在检查和校正可以针对特定话语动态地发生,从而允许ASR 系统调整到快速变化的潜伏期条件。
-
公开(公告)号:US09437186B1
公开(公告)日:2016-09-06
申请号:US13921671
申请日:2013-06-19
Applicant: Amazon Technologies, Inc.
Inventor: Baiyang Liu , Hugh Evan Secker-Walker , Alexander David Rosen
CPC classification number: G10L15/05 , G10L15/00 , G10L15/1815 , G10L15/19 , G10L15/22 , G10L25/78 , G10L2015/223
Abstract: Determining the end of an utterance for purposes of automatic speech recognition (ASR) may be improved with a system that provides early results and/or incorporates semantic tagging. Early ASR results of an incoming utterance may be prepared based at least in part on an estimated endpoint and processed by a natural language understanding (NLU) process while final results, based at least in part on a final endpoint, are determined. If the early results match the final results, the early NLU results are already prepared for early execution. The endpoint may also be determined based at least in part on the content of the utterance, as represented by semantic tagging output from ASR processing. If the tagging indicate completion of a logical statement, an endpoint may be declared, or a threshold for silent frames prior to declaring an endpoint may be adjusted.
Abstract translation: 用于自动语音识别(ASR)的话语的确定结束可以通过提供早期结果和/或包含语义标签的系统来改进。 可以至少部分地基于估计的端点并且由自然语言理解(NLU)过程进行处理来准备传入话语的早期ASR结果,而至少部分地基于最终端点确定最终结果。 如果早期结果符合最终结果,则早期NLU结果已经准备好提前执行。 还可以至少部分地基于话音的内容来确定端点,如ASR处理的语义标签输出所表示的。 如果标记指示逻辑语句的完成,则可以声明端点,或者可以调整在声明端点之前的静默帧的阈值。
-
公开(公告)号:US09728188B1
公开(公告)日:2017-08-08
申请号:US15195587
申请日:2016-06-28
Applicant: Amazon Technologies, Inc.
Inventor: Alexander David Rosen , Michael James Rodehorst , George Jay Tucker , Aaron Lee Mathers Challenner
CPC classification number: G10L15/22 , G10L19/08 , G10L25/18 , G10L25/51 , G10L2015/223
Abstract: Systems and methods for detecting similar audio being received by separate voice activated electronic devices, and ignoring those commands, is described herein. In some embodiments, a voice activated electronic device may be activated by a wakeword that is output by the additional electronic device, such as a television or radio, may capture audio of sound subsequently following the wakeword, and may send audio data representing the sound to a backend system. Upon receipt, the backend system may, in parallel to performing automated speech recognition processing to the audio data, generate a sound profile of the audio data, and may compare that sound profile to sound profiles of recently received audio data and/or flagged sound profiles. If the generated sound profile is determined to match another sound profiles, then the automated speech recognition processing may be stopped, and the voice activated electronic device may be instructed to return to a keyword spotting mode. If the matching sound profile is not already stored in a database of known sound profiles, it can be stored for future comparisons.
-
公开(公告)号:US09378740B1
公开(公告)日:2016-06-28
申请号:US14502572
申请日:2014-09-30
Applicant: Amazon Technologies, Inc.
Inventor: Alexander David Rosen , Yuwang Yin
CPC classification number: G10L15/1822 , G06F17/3097 , G10L2015/223 , G10L2015/228
Abstract: Features are disclosed for identifying and providing command suggestions during automatic speech recognition. As utterances are interpreted, suggestions may be provided based on even partial interpretations to guide users of a client device to commands available via speech recognition.
Abstract translation: 公开了在自动语音识别期间识别和提供命令建议的特征。 当解释话语时,可以基于甚至部分解释来提供建议,以指导客户端设备的用户通过语音识别获得可用命令。
-
公开(公告)号:US10074364B1
公开(公告)日:2018-09-11
申请号:US15085772
申请日:2016-03-30
Applicant: Amazon Technologies, Inc.
Inventor: Colin Wills Wightman , Naresh Narayanan , Alexander David Rosen , Michael James Rodehorst , Daniel Robert Rashid
CPC classification number: G10L15/20 , G06F17/2775 , G10L15/10 , G10L15/26 , G10L15/265 , G10L17/04 , G10L25/51 , G10L2015/223
Abstract: Systems and methods for generating sound profiles of artificial commands detected by multiple voice activated electronic devices is described herein. In some embodiments, numerous voice activated electronic devices may send audio data representing a phrase to a backend system at a substantially same time. Text data representing the phrase, and counts for instances of that text data, may be generated. If the number of counts exceeds a predefined threshold, the backend system may cause any remaining response generation functionality that particular command that is in excess of the predefined threshold to be stopped, and those devices returned to a sleep state. In some embodiments, a sound profile unique to the phrase that caused the excess of the predefined threshold may be generated such that future instances of the same phrase may be recognized prior to text data being generated, conserving the backend system's resources.
-
公开(公告)号:US09934777B1
公开(公告)日:2018-04-03
申请号:US15248211
申请日:2016-08-26
Applicant: Amazon Technologies, Inc.
Inventor: Shaun Nidhiri Joseph , Sonal Pareek , Ariya Rastrow , Gautam Tiwari , Alexander David Rosen
CPC classification number: G10L15/063 , G10L15/02 , G10L15/08 , G10L15/1815 , G10L15/193 , G10L15/22 , G10L15/30 , G10L2015/025 , G10L2015/0635
Abstract: User-specific language models (LMs) that include internal word indexes to a word table specific to the user-specific LM rather than a word table specific to a system-wide LM. When the system-wide LM is updated, the word table of the user-specific LM may be updated to translate the user-specific indices to system-wide indices. This prevents having to update the internal indices of the user-specific LM every time the system-wide LM is updated.
-
公开(公告)号:US09613624B1
公开(公告)日:2017-04-04
申请号:US14314563
申请日:2014-06-25
Applicant: Amazon Technologies, Inc.
Inventor: Jake Simon Kramer , Alexander David Rosen , Kenneth John Basye
CPC classification number: G10L15/08 , G10L2015/085
Abstract: In a dynamic automatic speech recognition (ASR) processing system, ASR processing may be configured to estimate a latency of returning speech results to a user based on work being done by an ASR processor. The ASR processing system may measure work done by an ASR processor by measuring one or more time independent metrics and comparing the metrics to threshold values. If the metrics exceed the thresholds, the ASR system may take steps to reduce latency associated with processing the utterance, including adjusting a speech recognition parameter.
-
-
-
-
-
-