Patent search ap:("Amazon Technologies Page Inc.") AND inv:"Ariya Rastrow"

1.

发明授权
Endpointing in speech processing 有权

公开(公告)号：US12211517B1

公开(公告)日：2025-01-28

申请号：US17475699

申请日：2021-09-15

Applicant: Amazon Technologies, Inc.

Inventor： Roland Maximilian Rolf Maas , Bjorn Hoffmeister , Ariya Rastrow , James Garnet Droppo , Veerdhawal Pande , Maarten Van Segbroeck , Gautam Tiwari , Andrew Smith , Eli Joshua Fidler

IPC: G10L25/78 , G06N3/045 , G10L15/26 , G10L25/30

Abstract: A speech-processing system may determine potential endpoints in a user's speech. Such endpoint prediction may include determining a potential endpoint in a stream of audio data, and may additionally including determining an endpoint score representing a likelihood that the potential endpoint represents an end of speech representing a complete user input. When the potential endpoint has been determined, the system may publish a transcript of speech that preceded the potential endpoint, and send it to downstream components. The system may continue to transcribe audio data and determine additional potential endpoints while the downstream components process the transcript. The downstream components may determine whether the transcript is complete; e.g., represents the entirety of the user input. Final endpoint determinations may be made based on the results of the downstream processing including automatic speech recognition, natural language understanding, etc.

2.

发明公开
DEVICE-DIRECTED UTTERANCE DETECTION 审中-公开

公开(公告)号：US20230223023A1

公开(公告)日：2023-07-13

申请号：US18149181

申请日：2023-01-03

Applicant: Amazon Technologies, Inc.

Inventor： Ariya Rastrow , Eli Joshua Fidler , Roland Maximilian Rolf Maas , Nikko Strom , Aaron Eakin , Diamond Bishop , Bjorn Hoffmeister , Sanjeev Mishra

IPC: G10L15/22 , G10L15/18 , G10L15/26 , G10L15/08

CPC classification number: G10L15/22 , G10L15/26 , G10L15/1815 , G10L2015/088 , G10L2015/223 , G10L2015/228

Abstract: A speech interface device is configured to detect an interrupt event and process a voice command without detecting a wakeword. The device includes on-device interrupt architecture configured to detect when device-directed speech is present and send audio data to a remote system for speech processing. This architecture includes an interrupt detector that detects an interrupt event (e.g., device-directed speech) with low latency, enabling the device to quickly lower a volume of output audio and/or perform other actions in response to a potential voice command. In addition, the architecture includes a device directed classifier that processes an entire utterance and corresponding semantic information and detects device-directed speech with high accuracy. Using the device directed classifier, the device may reject the interrupt event and increase a volume of the output audio or may accept the interrupt event, causing the output audio to end and performing speech processing on the audio data.

3.

发明授权
Generation of automated message responses 有权

公开(公告)号：US11496582B2

公开(公告)日：2022-11-08

申请号：US16455604

申请日：2019-06-27

Applicant: Amazon Technologies, Inc.

Inventor： Ariya Rastrow , Tony Hardie , Rohit Prasad

IPC: G10L15/22 , H04L67/306 , H04L51/02 , H04M3/527 , G10L13/00 , H04M3/42

Abstract: Systems, methods, and devices for computer-generating responses and sending responses to communications when the recipient of the communication is unavailable are disclosed. An individual may send a message (either audio or text) to a recipient. The recipient may be unavailable to contemporaneously respond to the message (e.g., the recipient may be performing an action that makes is difficult or impractical for the recipient to contemporaneously respond to the audio message). When the recipient is unavailable, a response to the message is generated and sent without receiving an instruction from the recipient to do so. The response may be sent to the message originating individual, and content of the response may thereafter be sent to the recipient to receive feedback regarding the correctness of the response. Alternatively, the response content may first be sent to the recipient to receive the feedback, and thereafter the response may be sent to the message originating individual.

4.

发明授权
Language model adaptation 有权

公开(公告)号：US11302310B1

公开(公告)日：2022-04-12

申请号：US16426557

申请日：2019-05-30

Applicant: Amazon Technologies, Inc.

Inventor： Ankur Gandhe , Ariya Rastrow , Roland Maximilian Rolf Maas , Bjorn Hoffmeister

IPC: G10L15/01 , G10L15/065 , G10L15/06

Abstract: Exemplary embodiments relate to adapting a generic language model during runtime using domain-specific language model data. The system performs an audio frame-level analysis, to determine if the utterance corresponds to a particular domain and whether the ASR hypothesis needs to be rescored. The system processes, using a trained classifier, the ASR hypothesis (a partial hypothesis) generated for the audio data processed so far. The system determines whether to rescore the hypothesis after every few audio frames (representing a word in the utterance) are processed by the speech recognition system.

5.

发明申请
DIALOG MANAGEMENT FOR MULTIPLE USERS 有权

公开(公告)号：US20220093101A1

公开(公告)日：2022-03-24

申请号：US17112520

申请日：2020-12-04

Applicant: Amazon Technologies, Inc.

Inventor： Prakash Krishnan , Arindam Mandal , Siddhartha Reddy Jonnalagadda , Nikko Strom , Ariya Rastrow , Ying Shi , David Chi-Wai Tang , Nishtha Gupta , Aaron Challenner , Bonan Zheng , Angeliki Metallinou , Vincent Auvray , Minmin Shen

IPC: G10L15/22 , G10L15/20 , G06F3/16 , G10L13/08

Abstract: A system that is capable of resolving anaphora using timing data received by a local device. A local device outputs audio representing a list of entries. The audio may represent synthesized speech of the list of entries. A user can interrupt the device to select an entry in the list, such as by saying “that one.” The local device can determine an offset time representing the time between when audio playback began and when the user interrupted. The local device sends the offset time and audio data representing the utterance to a speech processing system which can then use the offset time and stored data to identify which entry on the list was most recently output by the local device when the user interrupted. The system can then resolve anaphora to match that entry and can perform additional processing based on the referred to item.

6.

发明申请
LANGUAGE AND GRAMMAR MODEL ADAPTATION 有权

公开(公告)号：US20220036893A1

公开(公告)日：2022-02-03

申请号：US17405677

申请日：2021-08-18

Applicant: Amazon Technologies, Inc.

Inventor： Ankur Gandhe , Ariya Rastrow , Gautam Tiwari , Ashish Vishwanath Shenoy , Chun Chen

IPC: G10L15/193 , G10L15/22

Abstract: Systems and methods described herein relate to adapting a language model for automatic speech recognition (ASR) for a new set of words. Instead of retraining the ASR models, language models and grammar models, the system only modifies one grammar model and ensures its compatibility with the existing models in the ASR system.

7.

发明授权
Compressed finite state transducers for automatic speech recognition 有权

公开(公告)号：US10381000B1

公开(公告)日：2019-08-13

申请号：US15864689

申请日：2018-01-08

Applicant: Amazon Technologies, Inc.

Inventor： Denis Sergeyevich Filimonov , Gautam Tiwari , Shaun Nidhiri Joseph , Ariya Rastrow

IPC: G10L15/00 , G10L15/193 , G10L15/18 , G10L15/06 , G10L15/02

Abstract: Compact finite state transducers (FSTs) for automatic speech recognition (ASR). An HCLG FST and/or G FST may be compacted at training time to reduce the size of the FST to be used at runtime. The compact FSTs may be significantly smaller (e.g., 50% smaller) in terms of memory size, thus reducing the use of computing resources at runtime to operate the FSTs. The individual arcs and states of each FST may be compacted by binning individual weights, thus reducing the number of bits needed for each weight. Further, certain fields such as a next state ID may be left out of a compact FST if an estimation technique can be used to reproduce the next state at runtime. During runtime portions of the FSTs may be decompressed for processing by an ASR engine.

8.

发明授权
Lattice encoding using recurrent neural networks 有权

公开(公告)号：US10176802B1

公开(公告)日：2019-01-08

申请号：US15091722

申请日：2016-04-06

Applicant: Amazon Technologies, Inc.

Inventor： Faisal Ladhak , Ankur Gandhe , Markus Dreyer , Ariya Rastrow , Björn Hoffmeister , Lambert Mathias

IPC: G10L15/16 , G10L19/038 , G06N3/04

Abstract: An automatic speech recognition (ASR) system may convert an ASR output lattice into a matrix form, thus maintaining certain information included in the lattice that might otherwise be lost in an N-best list output. The matrix representation of the lattice may be encoded using a recurrent neural network (RNN) to create a vector representation of the lattice. The vector representation may then be used by the system to perform additional operations, such as ASR results confirmation.

9.

发明授权
Generation of predictive natural language processing models 有权

公开(公告)号：US10049656B1

公开(公告)日：2018-08-14

申请号：US14033346

申请日：2013-09-20

Applicant: Amazon Technologies, Inc.

Inventor： William Folwell Barton , Rohit Prasad , Stephen Frederick Potter , Nikko Strom , Yuzo Watanabe , Madan Mohan Rao Jampani , Ariya Rastrow , Arushan Rajasekaram

IPC: G10L15/18 , G10L15/22 , G10L15/02

Abstract: Features are disclosed for generating predictive personal natural language processing models based on user-specific profile information. The predictive personal models can provide broader coverage of the various terms, named entities, and/or intents of an utterance by the user than a personal model, while providing better accuracy than a general model. Profile information may be obtained from various data sources. Predictions regarding the content or subject of future user utterances may be made from the profile information. Predictive personal models may be generated based on the predictions. Future user utterances may be processed using the predictive personal models.

10.

发明授权
Speech processing techniques 有权

公开(公告)号：US12205574B1

公开(公告)日：2025-01-21

申请号：US17208615

申请日：2021-03-22

Applicant: Amazon Technologies, Inc.

Inventor： Grant Strimel , Ariya Rastrow , Jonathan Jenner Macoskey

IPC: G10L15/06 , G10L25/51

Abstract: Techniques for using multiple machine learning (ML) models, with varying compute costs, for ASR processing is described. The system may include an arbitrator component configured to determine which ML model is to be used to process an audio frame from a sequence of audio frames representing a spoken natural language input. The arbitrator component may switch between the ML models, on a frame-by-frame basis, to reduce an overall compute cost for the entire spoken natural language input. The outputs of the different ML models may be combined to determine the final output for the entire spoken natural language input.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification