Patent search ap:("Amazon Technologies Page Inc.") AND inv:"Nikko Strom"

1.

发明授权
Acoustic trigger detection 有权

公开(公告)号：US10460722B1

公开(公告)日：2019-10-29

申请号：US15639175

申请日：2017-06-30

Applicant: Amazon Technologies, Inc.

Inventor： Ming Sun , David Snyder , Yixin Gao , Nikko Strom , Spyros Matsoukas , Shiv Naga Prasad Vitaladevuni

IPC: G10L15/06 , G10L15/16 , H04M1/27 , G06F3/16 , G10L15/32 , G10L15/22

Abstract: A method for selective transmission of audio data to a speech processing server uses detection of an acoustic trigger in the audio data in determining the data to transmit. Detection of the acoustic trigger makes use of an efficient computation approach that reduces the amount of run-time computation required, or equivalently improves accuracy for a given amount of computation, by combining a “time delay” structure in which intermediate results of computations are reused at various time delays, thereby avoiding computation of computing new results, and decomposition of certain transformations to require fewer arithmetic operations without sacrificing significant performance. For a given amount of computation capacity the combination of these two techniques provides improved accuracy as compared to current approaches.

2.

发明申请
GENERATION OF PREDICTIVE NATURAL LANGUAGE PROCESSING MODELS 审中-公开

公开(公告)号：US20190180736A1

公开(公告)日：2019-06-13

申请号：US16102108

申请日：2018-08-13

Applicant: Amazon Technologies, Inc.

Inventor： William Folwell Barton , Rohit Prasad , Stephen Frederick Potter , Nikko Strom , Yuzo Watanabe , Madan Mohan Rao Jampani , Ariya Rastrow , Arushan Rajasekaram

IPC: G10L15/183

Abstract: Features are disclosed for generating predictive personal natural language processing models based on user-specific profile information. The predictive personal models can provide broader coverage of the various terms, named entities, and/or intents of an utterance by the user than a personal model, while providing better accuracy than a general model. Profile information may be obtained from various data sources. Predictions regarding the content or subject of future user utterances may be made from the profile information. Predictive personal models may be generated based on the predictions. Future user utterances may be processed using the predictive personal models.

3.

发明授权
Estimating speaker-specific affine transforms for neural network based speech recognition systems 有权
Title translation: 基于神经网络的语音识别系统估计说话人特定的仿射变换

公开(公告)号：US09378735B1

公开(公告)日：2016-06-28

申请号：US14135474

申请日：2013-12-19

Applicant: Amazon Technologies, Inc.

Inventor： Sri Venkata Surya Siva Rama Krishna Garimella , Bjorn Hoffmeister , Nikko Strom

IPC: G10L15/16 , G10L15/06 , G10L15/20 , G10L13/08

CPC classification number: G10L15/16

Abstract: Features are disclosed for estimating affine transforms in Log Filter-Bank Energy Space (“LFBE” space) in order to adapt artificial neural network-based acoustic models to a new speaker or environment. Neural network-based acoustic models may be trained using concatenated LFBEs as input features. The affine transform may be estimated by minimizing the least squares error between corresponding linear and bias transform parts for the resultant neural network feature vector and some standard speaker-specific feature vector obtained for a GMM-based acoustic model using constrained Maximum Likelihood Linear Regression (“cMLLR”) techniques. Alternatively, the affine transform may be estimated by minimizing the least squares error between the resultant transformed neural network feature and some standard speaker-specific feature obtained for a GMM-based acoustic model.

Abstract translation: 公开了用于估计Log Filter-Bank Energy Space（“LFBE”空间）中的仿射变换的特征，以便将基于人造神经网络的声学模型适应于新的扬声器或环境。可以使用连接的LFBE作为输入特征来训练基于神经网络的声学模型。仿射变换可以通过最小化用于所得到的神经网络特征向量的相应线性偏置变换部分和偏置变换部分之间的最小二乘误差来估计，以及使用约束最大似然线性回归（“ cMLLR“）技术。或者，可以通过最小化所得到的经变换的神经网络特征与为基于GMM的声学模型获得的某些标准的说话者特有特征之间的最小二乘误差来估计仿射变换。

4.

发明申请
SPEECH RECOGNIZER WITH MULTI-DIRECTIONAL DECODING 有权
Title translation: 具有多方向解码的语音识别器

公开(公告)号：US20150095026A1

公开(公告)日：2015-04-02

申请号：US14039383

申请日：2013-09-27

Applicant: Amazon Technologies, Inc.

Inventor： Michael Maximilian Emanuel Bisani , Nikko Strom , Bjorn Hoffmeister , Ryan Paul Thomas

IPC: G10L15/00 , G10L15/16

CPC classification number: G10L15/32 , G10L15/01 , G10L15/08 , G10L15/16 , G10L21/0272 , G10L25/78 , G10L2021/02166 , H04R1/406 , H04R3/005 , H04R2201/401 , H04R2410/01 , H04R2430/21

Abstract: In an automatic speech recognition (ASR) processing system, ASR processing may be configured to process speech based on multiple channels of audio received from a beamformer. The ASR processing system may include a microphone array and the beamformer to output multiple channels of audio such that each channel isolates audio in a particular direction. The multichannel audio signals may include spoken utterances/speech from one or more speakers as well as undesired audio, such as noise from a household appliance. The ASR device may simultaneously perform speech recognition on the multi-channel audio to provide more accurate speech recognition results.

Abstract translation: 在自动语音识别（ASR）处理系统中，ASR处理可以被配置为基于从波束形成器接收的多个音频信道来处理语音。 ASR处理系统可以包括麦克风阵列和波束形成器以输出多个音频通道，使得每个通道在特定方向上隔离音频。多声道音频信号可以包括来自一个或多个扬声器的说话话音/语音以及不期望的音频，例如来自家用电器的噪声。 ASR设备可以同时对多声道音频执行语音识别，以提供更准确的语音识别结果。

5.

发明授权
Dialog management for multiple users 有权

公开(公告)号：US11908468B2

公开(公告)日：2024-02-20

申请号：US17112520

申请日：2020-12-04

Applicant: Amazon Technologies, Inc.

Inventor： Prakash Krishnan , Arindam Mandal , Siddhartha Reddy Jonnalagadda , Nikko Strom , Ariya Rastrow , Ying Shi , David Chi-Wai Tang , Nishtha Gupta , Aaron Challenner , Bonan Zheng , Angeliki Metallinou , Vincent Auvray , Minmin Shen

IPC: G10L25/78 , G10L15/22 , G10L15/24 , G10L15/08 , G10L15/06 , G06V40/20 , G06F3/16 , G10L13/08 , G10L15/20 , G06V40/10 , G06V10/40 , G10L15/02 , G06F18/24

CPC classification number: G10L15/22 , G06F3/167 , G06F18/24 , G06V10/40 , G06V40/10 , G06V40/20 , G10L13/08 , G10L15/02 , G10L15/063 , G10L15/08 , G10L15/20 , G10L15/222 , G10L15/24 , G10L2015/0635 , G10L2015/088 , G10L2015/223 , G10L2015/227

Abstract: A system that is capable of resolving anaphora using timing data received by a local device. A local device outputs audio representing a list of entries. The audio may represent synthesized speech of the list of entries. A user can interrupt the device to select an entry in the list, such as by saying “that one.” The local device can determine an offset time representing the time between when audio playback began and when the user interrupted. The local device sends the offset time and audio data representing the utterance to a speech processing system which can then use the offset time and stored data to identify which entry on the list was most recently output by the local device when the user interrupted. The system can then resolve anaphora to match that entry and can perform additional processing based on the referred to item.

6.

发明授权
Deep multi-channel acoustic modeling using multiple microphone array geometries 有权

公开(公告)号：US11574628B1

公开(公告)日：2023-02-07

申请号：US16368331

申请日：2019-03-28

Applicant: Amazon Technologies, Inc.

Inventor： Kenichi Kumatani , Minhua Wu , Shiva Sundaram , Nikko Strom , Bjorn Hoffmeister

IPC: G10L15/16 , G10L25/30 , G10L15/02 , G06N3/08

Abstract: Techniques for speech processing using a deep neural network (DNN) based acoustic model front-end are described. A new modeling approach directly models multi-channel audio data received from a microphone array using a first model (e.g., multi-geometry/multi-channel DNN) that is trained using a plurality of microphone array geometries. Thus, the first model may receive a variable number of microphone channels, generate multiple outputs using multiple microphone array geometries, and select the best output as a first feature vector that may be used similarly to beamformed features generated by an acoustic beamformer. A second model (e.g., feature extraction DNN) processes the first feature vector and transforms it to a second feature vector having a lower dimensional representation. A third model (e.g., classification DNN) processes the second feature vector to perform acoustic unit classification and generate text data. The DNN front-end enables improved performance despite a reduction in microphones.

7.

发明授权
Device-directed utterance detection 有权

公开(公告)号：US11551685B2

公开(公告)日：2023-01-10

申请号：US16822744

申请日：2020-03-18

Applicant: Amazon Technologies, Inc.

Inventor： Ariya Rastrow , Eli Joshua Fidler , Roland Maximilian Rolf Maas , Nikko Strom , Aaron Eakin , Diamond Bishop , Bjorn Hoffmeister , Sanjeev Mishra

IPC: G10L15/22 , G10L15/18 , G10L15/26 , G10L15/08

Abstract: A speech interface device is configured to detect an interrupt event and process a voice command without detecting a wakeword. The device includes on-device interrupt architecture configured to detect when device-directed speech is present and send audio data to a remote system for speech processing. This architecture includes an interrupt detector that detects an interrupt event (e.g., device-directed speech) with low latency, enabling the device to quickly lower a volume of output audio and/or perform other actions in response to a potential voice command. In addition, the architecture includes a device directed classifier that processes an entire utterance and corresponding semantic information and detects device-directed speech with high accuracy. Using the device directed classifier, the device may reject the interrupt event and increase a volume of the output audio or may accept the interrupt event, causing the output audio to end and performing speech processing on the audio data.

8.

发明授权
Sentiment aware voice user interface 有权

公开(公告)号：US11508361B2

公开(公告)日：2022-11-22

申请号：US16889420

申请日：2020-06-01

Applicant: Amazon Technologies, Inc.

Inventor： Isaac Joseph Madwed , Julia Kennedy Nemer , Joo-Kyung Kim , Nikko Strom , Steven Mack Saunders , Laura Maggia Panfili , Anna Caitlin Jentoft , Sungjin Lee , David Thomas , Young-Bum Kim , Pablo Cesar Ganga , Chenlei Guo , Shuting Tang , Zhenyu Yao

IPC: G10L15/18 , G10L15/22 , G10L15/26

Abstract: Described herein is a system for responding to a frustrated user with a response determined based on spoken language understanding (SLU) processing of a user input. The system detects user frustration and responds to a repeated user input by confirming an action to be performed or presenting an alternative action, instead of performing the action responsive to the user input. The system also detects poor audio quality of the captured user input, and responds by requesting the user to repeat the user input. The system processes sentiment data and signal quality data to respond to user inputs.

9.

发明授权
Deep multi-channel acoustic modeling 有权

公开(公告)号：US11475881B2

公开(公告)日：2022-10-18

申请号：US16932049

申请日：2020-07-17

Applicant: Amazon Technologies, Inc.

Inventor： Arindam Mandal , Kenichi Kumatani , Nikko Strom , Minhua Wu , Shiva Sundaram , Bjorn Hoffmeister , Jeremie Lecomte

IPC: G10L15/16 , G10L15/22 , G06N3/08 , G10L15/06 , G10L15/30 , H04R3/00 , H04R1/40

Abstract: Techniques for speech processing using a deep neural network (DNN) based acoustic model front-end are described. A new modeling approach directly models multi-channel audio data received from a microphone array using a first model (e.g., multi-channel DNN) that takes in raw signals and produces a first feature vector that may be used similarly to beamformed features generated by an acoustic beamformer. A second model (e.g., feature extraction DNN) processes the first feature vector and transforms it to a second feature vector having a lower dimensional representation. A third model (e.g., classification DNN) processes the second feature vector to perform acoustic unit classification and generate text data. These three models may be jointly optimized for speech processing (as opposed to individually optimized for signal enhancement), enabling improved performance despite a reduction in microphones and a reduction in bandwidth consumption during real-time processing.

10.

发明申请
DIALOG MANAGEMENT FOR MULTIPLE USERS 有权

公开(公告)号：US20220093093A1

公开(公告)日：2022-03-24

申请号：US17112227

申请日：2020-12-04

Applicant: Amazon Technologies, Inc.

Inventor： Prakash Krishnan , Arindam Mandal , Nikko Strom , Pradeep Natarajan , Ariya Rastrow , Shiv Naga Prasad Vitaladevuni , David Chi-Wai Tang , Aaron Challenner , Xu Zhang , Krishna Anisetty , Josey Diego Sandoval , Rohit Prasad , Premkumar Natarajan

IPC: G10L15/22 , G10L15/08 , G10L15/24 , G06K9/46 , G06K9/62 , G06K9/00 , G10L15/02

Abstract: A system can operate a speech-controlled device in a mode where the speech-controlled device determines that an utterance is directed at the speech-controlled device using image data showing the user speaking the utterance. If the user is directing the user's gaze at the speech-controlled device while speaking, the system may determine the utterance is system directed and thus may perform further speech processing based on the utterance. If the user's gaze is directed elsewhere, the system may determine the utterance is not system directed (for example directed at another user) and thus the system may not perform further speech processing based on the utterance and may take other actions, for example discarding audio data of the utterance.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification