-
公开(公告)号:US12211517B1
公开(公告)日:2025-01-28
申请号:US17475699
申请日:2021-09-15
Applicant: Amazon Technologies, Inc.
Inventor: Roland Maximilian Rolf Maas , Bjorn Hoffmeister , Ariya Rastrow , James Garnet Droppo , Veerdhawal Pande , Maarten Van Segbroeck , Gautam Tiwari , Andrew Smith , Eli Joshua Fidler
Abstract: A speech-processing system may determine potential endpoints in a user's speech. Such endpoint prediction may include determining a potential endpoint in a stream of audio data, and may additionally including determining an endpoint score representing a likelihood that the potential endpoint represents an end of speech representing a complete user input. When the potential endpoint has been determined, the system may publish a transcript of speech that preceded the potential endpoint, and send it to downstream components. The system may continue to transcribe audio data and determine additional potential endpoints while the downstream components process the transcript. The downstream components may determine whether the transcript is complete; e.g., represents the entirety of the user input. Final endpoint determinations may be made based on the results of the downstream processing including automatic speech recognition, natural language understanding, etc.
-
公开(公告)号:US20230223023A1
公开(公告)日:2023-07-13
申请号:US18149181
申请日:2023-01-03
Applicant: Amazon Technologies, Inc.
Inventor: Ariya Rastrow , Eli Joshua Fidler , Roland Maximilian Rolf Maas , Nikko Strom , Aaron Eakin , Diamond Bishop , Bjorn Hoffmeister , Sanjeev Mishra
CPC classification number: G10L15/22 , G10L15/26 , G10L15/1815 , G10L2015/088 , G10L2015/223 , G10L2015/228
Abstract: A speech interface device is configured to detect an interrupt event and process a voice command without detecting a wakeword. The device includes on-device interrupt architecture configured to detect when device-directed speech is present and send audio data to a remote system for speech processing. This architecture includes an interrupt detector that detects an interrupt event (e.g., device-directed speech) with low latency, enabling the device to quickly lower a volume of output audio and/or perform other actions in response to a potential voice command. In addition, the architecture includes a device directed classifier that processes an entire utterance and corresponding semantic information and detects device-directed speech with high accuracy. Using the device directed classifier, the device may reject the interrupt event and increase a volume of the output audio or may accept the interrupt event, causing the output audio to end and performing speech processing on the audio data.
-
公开(公告)号:US11496582B2
公开(公告)日:2022-11-08
申请号:US16455604
申请日:2019-06-27
Applicant: Amazon Technologies, Inc.
Inventor: Ariya Rastrow , Tony Hardie , Rohit Prasad
Abstract: Systems, methods, and devices for computer-generating responses and sending responses to communications when the recipient of the communication is unavailable are disclosed. An individual may send a message (either audio or text) to a recipient. The recipient may be unavailable to contemporaneously respond to the message (e.g., the recipient may be performing an action that makes is difficult or impractical for the recipient to contemporaneously respond to the audio message). When the recipient is unavailable, a response to the message is generated and sent without receiving an instruction from the recipient to do so. The response may be sent to the message originating individual, and content of the response may thereafter be sent to the recipient to receive feedback regarding the correctness of the response. Alternatively, the response content may first be sent to the recipient to receive the feedback, and thereafter the response may be sent to the message originating individual.
-
公开(公告)号:US11302310B1
公开(公告)日:2022-04-12
申请号:US16426557
申请日:2019-05-30
Applicant: Amazon Technologies, Inc.
Inventor: Ankur Gandhe , Ariya Rastrow , Roland Maximilian Rolf Maas , Bjorn Hoffmeister
IPC: G10L15/01 , G10L15/065 , G10L15/06
Abstract: Exemplary embodiments relate to adapting a generic language model during runtime using domain-specific language model data. The system performs an audio frame-level analysis, to determine if the utterance corresponds to a particular domain and whether the ASR hypothesis needs to be rescored. The system processes, using a trained classifier, the ASR hypothesis (a partial hypothesis) generated for the audio data processed so far. The system determines whether to rescore the hypothesis after every few audio frames (representing a word in the utterance) are processed by the speech recognition system.
-
公开(公告)号:US20220093101A1
公开(公告)日:2022-03-24
申请号:US17112520
申请日:2020-12-04
Applicant: Amazon Technologies, Inc.
Inventor: Prakash Krishnan , Arindam Mandal , Siddhartha Reddy Jonnalagadda , Nikko Strom , Ariya Rastrow , Ying Shi , David Chi-Wai Tang , Nishtha Gupta , Aaron Challenner , Bonan Zheng , Angeliki Metallinou , Vincent Auvray , Minmin Shen
Abstract: A system that is capable of resolving anaphora using timing data received by a local device. A local device outputs audio representing a list of entries. The audio may represent synthesized speech of the list of entries. A user can interrupt the device to select an entry in the list, such as by saying “that one.” The local device can determine an offset time representing the time between when audio playback began and when the user interrupted. The local device sends the offset time and audio data representing the utterance to a speech processing system which can then use the offset time and stored data to identify which entry on the list was most recently output by the local device when the user interrupted. The system can then resolve anaphora to match that entry and can perform additional processing based on the referred to item.
-
公开(公告)号:US20220036893A1
公开(公告)日:2022-02-03
申请号:US17405677
申请日:2021-08-18
Applicant: Amazon Technologies, Inc.
Inventor: Ankur Gandhe , Ariya Rastrow , Gautam Tiwari , Ashish Vishwanath Shenoy , Chun Chen
IPC: G10L15/193 , G10L15/22
Abstract: Systems and methods described herein relate to adapting a language model for automatic speech recognition (ASR) for a new set of words. Instead of retraining the ASR models, language models and grammar models, the system only modifies one grammar model and ensures its compatibility with the existing models in the ASR system.
-
公开(公告)号:US10381000B1
公开(公告)日:2019-08-13
申请号:US15864689
申请日:2018-01-08
Applicant: Amazon Technologies, Inc.
Inventor: Denis Sergeyevich Filimonov , Gautam Tiwari , Shaun Nidhiri Joseph , Ariya Rastrow
IPC: G10L15/00 , G10L15/193 , G10L15/18 , G10L15/06 , G10L15/02
Abstract: Compact finite state transducers (FSTs) for automatic speech recognition (ASR). An HCLG FST and/or G FST may be compacted at training time to reduce the size of the FST to be used at runtime. The compact FSTs may be significantly smaller (e.g., 50% smaller) in terms of memory size, thus reducing the use of computing resources at runtime to operate the FSTs. The individual arcs and states of each FST may be compacted by binning individual weights, thus reducing the number of bits needed for each weight. Further, certain fields such as a next state ID may be left out of a compact FST if an estimation technique can be used to reproduce the next state at runtime. During runtime portions of the FSTs may be decompressed for processing by an ASR engine.
-
公开(公告)号:US10176802B1
公开(公告)日:2019-01-08
申请号:US15091722
申请日:2016-04-06
Applicant: Amazon Technologies, Inc.
Inventor: Faisal Ladhak , Ankur Gandhe , Markus Dreyer , Ariya Rastrow , Björn Hoffmeister , Lambert Mathias
IPC: G10L15/16 , G10L19/038 , G06N3/04
Abstract: An automatic speech recognition (ASR) system may convert an ASR output lattice into a matrix form, thus maintaining certain information included in the lattice that might otherwise be lost in an N-best list output. The matrix representation of the lattice may be encoded using a recurrent neural network (RNN) to create a vector representation of the lattice. The vector representation may then be used by the system to perform additional operations, such as ASR results confirmation.
-
公开(公告)号:US10049656B1
公开(公告)日:2018-08-14
申请号:US14033346
申请日:2013-09-20
Applicant: Amazon Technologies, Inc.
Inventor: William Folwell Barton , Rohit Prasad , Stephen Frederick Potter , Nikko Strom , Yuzo Watanabe , Madan Mohan Rao Jampani , Ariya Rastrow , Arushan Rajasekaram
Abstract: Features are disclosed for generating predictive personal natural language processing models based on user-specific profile information. The predictive personal models can provide broader coverage of the various terms, named entities, and/or intents of an utterance by the user than a personal model, while providing better accuracy than a general model. Profile information may be obtained from various data sources. Predictions regarding the content or subject of future user utterances may be made from the profile information. Predictive personal models may be generated based on the predictions. Future user utterances may be processed using the predictive personal models.
-
公开(公告)号:US12205574B1
公开(公告)日:2025-01-21
申请号:US17208615
申请日:2021-03-22
Applicant: Amazon Technologies, Inc.
Inventor: Grant Strimel , Ariya Rastrow , Jonathan Jenner Macoskey
Abstract: Techniques for using multiple machine learning (ML) models, with varying compute costs, for ASR processing is described. The system may include an arbitrator component configured to determine which ML model is to be used to process an audio frame from a sequence of audio frames representing a spoken natural language input. The arbitrator component may switch between the ML models, on a frame-by-frame basis, to reduce an overall compute cost for the entire spoken natural language input. The outputs of the different ML models may be combined to determine the final output for the entire spoken natural language input.
-
-
-
-
-
-
-
-
-