-
公开(公告)号:US10210862B1
公开(公告)日:2019-02-19
申请号:US15091871
申请日:2016-04-06
Applicant: Amazon Technologies, Inc.
Inventor: Faisal Ladhak , Ankur Gandhe , Markus Dreyer , Ariya Rastrow , Björn Hoffmeister , Lambert Mathias
IPC: G06F17/20 , G10L15/00 , G10L15/16 , G10L19/038 , G06N3/04
Abstract: Neural networks may be used in certain automatic speech recognition systems. To improve performance at these neural networks, the present system converts the lattice into a matrix form, thus maintaining certain information included in the lattice that might otherwise be lost while also placing the lattice in a form that may be manipulated by other components to perform operations such as checking ASR results. The matrix representation of the lattice may be transformed into a vector representation by calculations performed at a recurrent neural network (RNN). By representing the lattice as a vector representation the system may perform additional operations, such as ASR results confirmation.
-
公开(公告)号:US11437043B1
公开(公告)日:2022-09-06
申请号:US16712613
申请日:2019-12-12
Applicant: Amazon Technologies, Inc.
Inventor: Lizhen Peng , Alok Upadhyay , Jason Cline , Ankur Gandhe
Abstract: Systems and methods for presence ground truth approximation and utilization are disclosed. For example, a system detects the presence of a predefined subject, such as a person associated with a given user profile, and/or determines that authentication criteria for performing an action in association with the user profile has been satisfied. A period of time to associate data is determined, and data of one or more data types is labeled as being associated with the speaker identification event. That data may be formatted and input into one or more models to train those models to more accurately detect presence and/or determine whether authentication of a user profile should succeed.
-
公开(公告)号:US11302310B1
公开(公告)日:2022-04-12
申请号:US16426557
申请日:2019-05-30
Applicant: Amazon Technologies, Inc.
Inventor: Ankur Gandhe , Ariya Rastrow , Roland Maximilian Rolf Maas , Bjorn Hoffmeister
IPC: G10L15/01 , G10L15/065 , G10L15/06
Abstract: Exemplary embodiments relate to adapting a generic language model during runtime using domain-specific language model data. The system performs an audio frame-level analysis, to determine if the utterance corresponds to a particular domain and whether the ASR hypothesis needs to be rescored. The system processes, using a trained classifier, the ASR hypothesis (a partial hypothesis) generated for the audio data processed so far. The system determines whether to rescore the hypothesis after every few audio frames (representing a word in the utterance) are processed by the speech recognition system.
-
公开(公告)号:US20220036893A1
公开(公告)日:2022-02-03
申请号:US17405677
申请日:2021-08-18
Applicant: Amazon Technologies, Inc.
Inventor: Ankur Gandhe , Ariya Rastrow , Gautam Tiwari , Ashish Vishwanath Shenoy , Chun Chen
IPC: G10L15/193 , G10L15/22
Abstract: Systems and methods described herein relate to adapting a language model for automatic speech recognition (ASR) for a new set of words. Instead of retraining the ASR models, language models and grammar models, the system only modifies one grammar model and ensures its compatibility with the existing models in the ASR system.
-
公开(公告)号:US10176802B1
公开(公告)日:2019-01-08
申请号:US15091722
申请日:2016-04-06
Applicant: Amazon Technologies, Inc.
Inventor: Faisal Ladhak , Ankur Gandhe , Markus Dreyer , Ariya Rastrow , Björn Hoffmeister , Lambert Mathias
IPC: G10L15/16 , G10L19/038 , G06N3/04
Abstract: An automatic speech recognition (ASR) system may convert an ASR output lattice into a matrix form, thus maintaining certain information included in the lattice that might otherwise be lost in an N-best list output. The matrix representation of the lattice may be encoded using a recurrent neural network (RNN) to create a vector representation of the lattice. The vector representation may then be used by the system to perform additional operations, such as ASR results confirmation.
-
公开(公告)号:US12014726B2
公开(公告)日:2024-06-18
申请号:US17706057
申请日:2022-03-28
Applicant: Amazon Technologies, Inc.
Inventor: Ankur Gandhe , Ariya Rastrow , Roland Maximilian Rolf Maas , Bjorn Hoffmeister
IPC: G10L15/22 , G10L15/01 , G10L15/06 , G10L15/065
CPC classification number: G10L15/065 , G10L15/01 , G10L15/063
Abstract: Exemplary embodiments relate to adapting a generic language model during runtime using domain-specific language model data. The system performs an audio frame-level analysis, to determine if the utterance corresponds to a particular domain and whether the ASR hypothesis needs to be rescored. The system processes, using a trained classifier, the ASR hypothesis (a partial hypothesis) generated for the audio data processed so far. The system determines whether to rescore the hypothesis after every few audio frames (representing a word in the utterance) are processed by the speech recognition system.
-
公开(公告)号:US11705116B2
公开(公告)日:2023-07-18
申请号:US17405677
申请日:2021-08-18
Applicant: Amazon Technologies, Inc.
Inventor: Ankur Gandhe , Ariya Rastrow , Gautam Tiwari , Ashish Vishwanath Shenoy , Chun Chen
IPC: G10L15/193 , G10L15/22 , G10L15/30
CPC classification number: G10L15/193 , G10L15/22 , G10L15/30 , G10L2015/223
Abstract: Systems and methods described herein relate to adapting a language model for automatic speech recognition (ASR) for a new set of words. Instead of retraining the ASR models, language models and grammar models, the system only modifies one grammar model and ensures its compatibility with the existing models in the ASR system.
-
公开(公告)号:US11211058B1
公开(公告)日:2021-12-28
申请号:US16577394
申请日:2019-09-20
Applicant: Amazon Technologies, Inc.
Inventor: Aaron Eakin , Angela Sun , Ankur Gandhe , Ariya Rastrow , Chenlei Guo , Xing Fan
IPC: G10L15/197 , G10L15/30 , G10L15/22
Abstract: Described herein is a system for prompting a user for clarification when an automatic speech recognition (ASR) system encounters ambiguity with respect to the user's input. The feedback provided by the user is used to retrain machine-learning models and/or to generate new machine-learning models. Based on the type of ambiguity, the system may determine to retrain one or more ASR models that are widely used by the system or to generate/update one or more user-specific models that are used to process inputs from one or more particular users.
-
公开(公告)号:US10121467B1
公开(公告)日:2018-11-06
申请号:US15197923
申请日:2016-06-30
Applicant: Amazon Technologies, Inc.
Inventor: Ankur Gandhe , Denis Sergeyevich Filimonov , Ariya Rastrow , Björn Hoffmeister
IPC: G10L15/06 , G10L15/183 , G10L15/16 , G10L15/197
Abstract: A language model for automatic speech processing, such as a finite state transducer (FST) may be configured to incorporate information about how a particular word sequence (N-gram) may be used in a similar manner from another N-gram. A score of a component of the FST (such as an arc or state) relating to the first N-gram may be based on information of the second N-gram. Further, the FST may be configured to have an arc between a state of the first N-gram and a state of the second N-gram to allow for cross N-gram back off, rather than backoff from a larger N-gram to a smaller N-gram during traversal of the FST during speech processing.
-
公开(公告)号:US20220358908A1
公开(公告)日:2022-11-10
申请号:US17706057
申请日:2022-03-28
Applicant: Amazon Technologies, Inc.
Inventor: Ankur Gandhe , Ariya Rastrow , Roland Maximilian Rolf Maas , Bjorn Hoffmeister
IPC: G10L15/065 , G10L15/01 , G10L15/06
Abstract: Exemplary embodiments relate to adapting a generic language model during runtime using domain-specific language model data. The system performs an audio frame-level analysis, to determine if the utterance corresponds to a particular domain and whether the ASR hypothesis needs to be rescored. The system processes, using a trained classifier, the ASR hypothesis (a partial hypothesis) generated for the audio data processed so far. The system determines whether to rescore the hypothesis after every few audio frames (representing a word in the utterance) are processed by the speech recognition system.
-
-
-
-
-
-
-
-
-