-
公开(公告)号:US09697827B1
公开(公告)日:2017-07-04
申请号:US13711478
申请日:2012-12-11
Applicant: Amazon Technologies, Inc.
Inventor: Jeffrey Paul Lilly , Ryan Paul Thomas , Jeffrey Penrod Adams
Abstract: Features are disclosed for reducing errors in speech recognition processing. Methods for reducing errors can include receiving multiple speech recognition hypotheses based on an utterance indicative of a command or query of a user and determining a command or query within a grammar having a least amount of difference from one of the speech recognition hypotheses. The determination of the least amount of difference may be based at least in part on a comparison of individual subword units along at least some of the sequence paths of the speech recognition hypotheses and the grammar. For example, the comparison may be performed on the phoneme level instead of the word level.
-
公开(公告)号:US20150095026A1
公开(公告)日:2015-04-02
申请号:US14039383
申请日:2013-09-27
Applicant: Amazon Technologies, Inc.
CPC classification number: G10L15/32 , G10L15/01 , G10L15/08 , G10L15/16 , G10L21/0272 , G10L25/78 , G10L2021/02166 , H04R1/406 , H04R3/005 , H04R2201/401 , H04R2410/01 , H04R2430/21
Abstract: In an automatic speech recognition (ASR) processing system, ASR processing may be configured to process speech based on multiple channels of audio received from a beamformer. The ASR processing system may include a microphone array and the beamformer to output multiple channels of audio such that each channel isolates audio in a particular direction. The multichannel audio signals may include spoken utterances/speech from one or more speakers as well as undesired audio, such as noise from a household appliance. The ASR device may simultaneously perform speech recognition on the multi-channel audio to provide more accurate speech recognition results.
Abstract translation: 在自动语音识别(ASR)处理系统中,ASR处理可以被配置为基于从波束形成器接收的多个音频信道来处理语音。 ASR处理系统可以包括麦克风阵列和波束形成器以输出多个音频通道,使得每个通道在特定方向上隔离音频。 多声道音频信号可以包括来自一个或多个扬声器的说话话音/语音以及不期望的音频,例如来自家用电器的噪声。 ASR设备可以同时对多声道音频执行语音识别,以提供更准确的语音识别结果。
-
公开(公告)号:US11176936B2
公开(公告)日:2021-11-16
申请号:US16400905
申请日:2019-05-01
Applicant: Amazon Technologies, Inc.
Inventor: Lambert Mathias , Ying Shi , Imre Attila Kiss , Ryan Paul Thomas , Frederic Johan Georges Deramat
IPC: G10L15/22 , G10L15/26 , G06F40/35 , G06F40/40 , G06F40/56 , G06F40/284 , G06F40/295 , G10L13/08
Abstract: Features are disclosed for processing a user utterance with respect to multiple subject matters or domains, and for selecting a likely result from a particular domain with which to respond to the utterance or otherwise take action. A user utterance may be transcribed by an automatic speech recognition (“ASR”) module, and the results may be provided to a multi-domain natural language understanding (“NLU”) engine. The multi-domain NLU engine may process the transcription(s) in multiple individual domains rather than in a single domain. In some cases, the transcription(s) may be processed in multiple individual domains in parallel or substantially simultaneously. In addition, hints may be generated based on previous user interactions and other data. The ASR module, multi-domain NLU engine, and other components of a spoken language processing system may use the hints to more efficiently process input or more accurately generate output.
-
公开(公告)号:US10424292B1
公开(公告)日:2019-09-24
申请号:US13830222
申请日:2013-03-14
Applicant: Amazon Technologies, Inc.
Inventor: John Daniel Thimsen , Gregory Michael Hart , Ryan Paul Thomas
Abstract: An audio controlled assistant captures environmental noise and converts the environmental noise into audio signals. The audio signals are provided to a system which analyzes the audio signals for a plurality of audio prompts, which have been customized for the acoustic environment surrounding the audio controlled assistant by an acoustic modeling system. The system configured to detect the presence of an audio prompt in the audio signals and transmit instructions associated with the detected audio prompt to at least one of the audio controlled assistant or one or more cloud based services, in response.
-
公开(公告)号:US09436678B2
公开(公告)日:2016-09-06
申请号:US14754598
申请日:2015-06-29
Applicant: Amazon Technologies, Inc.
Inventor: Lambert Mathias , Ying Shi , Imre Attila Kiss , Ryan Paul Thomas , Frederic Johan Georges Deramat
CPC classification number: G10L15/22 , G06F17/277 , G06F17/278 , G06F17/279 , G06F17/28 , G06F17/2881 , G10L13/08 , G10L15/26
Abstract: Features are disclosed for processing a user utterance with respect to multiple subject matters or domains, and for selecting a likely result from a particular domain with which to respond to the utterance or otherwise take action. A user utterance may be transcribed by an automatic speech recognition (“ASR”) module, and the results may be provided to a multi-domain natural language understanding (“NLU”) engine. The multi-domain NLU engine may process the transcription(s) in multiple individual domains rather than in a single domain. In some cases, the transcription(s) may be processed in multiple individual domains in parallel or substantially simultaneously. In addition, hints may be generated based on previous user interactions and other data. The ASR module, multi-domain NLU engine, and other components of a spoken language processing system may use the hints to more efficiently process input or more accurately generate output.
Abstract translation: 公开了用于处理关于多个主题或域的用户话语的特征,并且从用于响应于话语或以其他方式采取行动的特定域中选择可能的结果。 用户话语可以通过自动语音识别(“ASR”)模块进行转录,并且可以将结果提供给多域自然语言理解(“NLU”)引擎。 多域NLU引擎可以处理多个单个域中的转录,而不是在单个域中处理转录。 在一些情况下,转录可以在多个单独的结构域中并行或基本同时地进行处理。 此外,可以基于先前的用户交互和其他数据生成提示。 ASR模块,多域NLU引擎和口语处理系统的其他组件可以使用提示来更有效地处理输入或更准确地生成输出。
-
公开(公告)号:US10964315B1
公开(公告)日:2021-03-30
申请号:US15639330
申请日:2017-06-30
Applicant: Amazon Technologies, Inc.
Inventor: Minhua Wu , Sankaran Panchapagesan , Ming Sun , Shiv Naga Prasad Vitaladevuni , Bjorn Hoffmeister , Ryan Paul Thomas , Arindam Mandal
Abstract: An approach to wakeword detection uses an explicit representation of non-wakeword speech in the form of subword (e.g., phonetic monophone) units that do not necessarily occur in the wakeword and that broadly represent general speech. These subword units are arranged in a “background” model, which at runtime essentially competes with the wakeword model such that a wakeword is less likely to be declare as occurring when the input matches that background model well. An HMM may be used with the model to locate possible occurrences of the wakeword. Features are determined from portions of the input corresponding to subword units of the wakeword detected using the HMM. A secondary classifier is then used to process the features to yield a decision of whether the wakeword occurred.
-
公开(公告)号:US10283119B2
公开(公告)日:2019-05-07
申请号:US15966400
申请日:2018-04-30
Applicant: Amazon Technologies, Inc.
Inventor: Lambert Mathias , Ying Shi , Imre Attila Kiss , Ryan Paul Thomas , Frederic Johan Georges Deramat
Abstract: Features are disclosed for processing a user utterance with respect to multiple subject matters or domains, and for selecting a likely result from a particular domain with which to respond to the utterance or otherwise take action. A user utterance may be transcribed by an automatic speech recognition (“ASR”) module, and the results may be provided to a multi-domain natural language understanding (“NLU”) engine. The multi-domain NLU engine may process the transcription(s) in multiple individual domains rather than in a single domain. In some cases, the transcription(s) may be processed in multiple individual domains in parallel or substantially simultaneously. In addition, hints may be generated based on previous user interactions and other data. The ASR module, multi-domain NLU engine, and other components of a spoken language processing system may use the hints to more efficiently process input or more accurately generate output.
-
公开(公告)号:US20180315425A1
公开(公告)日:2018-11-01
申请号:US15966400
申请日:2018-04-30
Applicant: Amazon Technologies, Inc.
Inventor: Lambert Mathias , Ying Shi , Imre Attila Kiss , Ryan Paul Thomas , Frederic Johan Georges Deramat
CPC classification number: G10L15/22 , G06F17/277 , G06F17/278 , G06F17/279 , G06F17/28 , G06F17/2881 , G10L13/08 , G10L15/26
Abstract: Features are disclosed for processing a user utterance with respect to multiple subject matters or domains, and for selecting a likely result from a particular domain with which to respond to the utterance or otherwise take action. A user utterance may be transcribed by an automatic speech recognition (“ASR”) module, and the results may be provided to a multi-domain natural language understanding (“NLU”) engine. The multi-domain NLU engine may process the transcription(s) in multiple individual domains rather than in a single domain. In some cases, the transcription(s) may be processed in multiple individual domains in parallel or substantially simultaneously. In addition, hints may be generated based on previous user interactions and other data. The ASR module, multi-domain NLU engine, and other components of a spoken language processing system may use the hints to more efficiently process input or more accurately generate output.
-
公开(公告)号:US09959869B2
公开(公告)日:2018-05-01
申请号:US15694996
申请日:2017-09-04
Applicant: Amazon Technologies, Inc.
Inventor: Lambert Mathias , Ying Shi , Imre Attila Kiss , Ryan Paul Thomas , Frederic Johan Georges Deramat
CPC classification number: G10L15/22 , G06F17/277 , G06F17/278 , G06F17/279 , G06F17/28 , G06F17/2881 , G10L13/08 , G10L15/26
Abstract: Features are disclosed for processing a user utterance with respect to multiple subject matters or domains, and for selecting a likely result from a particular domain with which to respond to the utterance or otherwise take action. A user utterance may be transcribed by an automatic speech recognition (“ASR”) module, and the results may be provided to a multi-domain natural language understanding (“NLU”) engine. The multi-domain NLU engine may process the transcription(s) in multiple individual domains rather than in a single domain. In some cases, the transcription(s) may be processed in multiple individual domains in parallel or substantially simultaneously. In addition, hints may be generated based on previous user interactions and other data. The ASR module, multi-domain NLU engine, and other components of a spoken language processing system may use the hints to more efficiently process input or more accurately generate output.
-
公开(公告)号:US20220148590A1
公开(公告)日:2022-05-12
申请号:US17454716
申请日:2021-11-12
Applicant: Amazon Technologies, Inc.
Inventor: Lambert Mathias , Ying Shi , Imre Attila Kiss , Ryan Paul Thomas , Frederic Johan Georges Deramat
IPC: G10L15/22 , G10L15/26 , G06F40/35 , G06F40/40 , G06F40/56 , G06F40/284 , G06F40/295 , G10L13/08
Abstract: Features are disclosed for processing a user utterance with respect to multiple subject matters or domains, and for selecting a likely result from a particular domain with which to respond to the utterance or otherwise take action. A user utterance may be transcribed by an automatic speech recognition (“ASR”) module, and the results may be provided to a multi-domain natural language understanding (“NLU”) engine. The multi-domain NLU engine may process the transcription(s) in multiple individual domains rather than in a single domain. In some cases, the transcription(s) may be processed in multiple individual domains in parallel or substantially simultaneously. In addition, hints may be generated based on previous user interactions and other data. The ASR module, multi-domain NLU engine, and other components of a spoken language processing system may use the hints to more efficiently process input or more accurately generate output.
-
-
-
-
-
-
-
-
-