Patent search ap:("AMAZON TECHNOLOGIES Page INC.") AND inv:"Bjorn Hoffmeister"

21.

发明授权
Deep multi-channel acoustic modeling 有权

公开(公告)号：US10726830B1

公开(公告)日：2020-07-28

申请号：US16143910

申请日：2018-09-27

Applicant: Amazon Technologies, Inc.

Inventor： Arindam Mandal , Kenichi Kumatani , Nikko Strom , Minhua Wu , Shiva Sundaram , Bjorn Hoffmeister , Jeremie Lecomte

IPC: G10L15/16 , G10L15/22 , G10L15/30 , G10L15/06 , G06N3/08 , H04R3/00 , H04R1/40

Abstract: Techniques for speech processing using a deep neural network (DNN) based acoustic model front-end are described. A new modeling approach directly models multi-channel audio data received from a microphone array using a first model (e.g., multi-channel DNN) that takes in raw signals and produces a first feature vector that may be used similarly to beamformed features generated by an acoustic beamformer. A second model (e.g., feature extraction DNN) processes the first feature vector and transforms it to a second feature vector having a lower dimensional representation. A third model (e.g., classification DNN) processes the second feature vector to perform acoustic unit classification and generate text data. These three models may be jointly optimized for speech processing (as opposed to individually optimized for signal enhancement), enabling improved performance despite a reduction in microphones and a reduction in bandwidth consumption during real-time processing.

22.

发明申请
ANCHORED SPEECH DETECTION AND SPEECH RECOGNITION 审中-公开

公开(公告)号：US20200035231A1

公开(公告)日：2020-01-30

申请号：US16437763

申请日：2019-06-11

Applicant: Amazon Technologies, Inc.

Inventor： Sree Hari Krishnan Parthasarathi , Bjorn Hoffmeister , Brian King , Roland Maas

IPC: G10L15/20 , G10L15/02 , G10L17/06 , G10L25/87 , G10L15/08 , G10L15/16 , G10L17/18

Abstract: A system configured to process speech commands may classify incoming audio as desired speech, undesired speech, or non-speech. Desired speech is speech that is from a same speaker as reference speech. The reference speech may be obtained from a configuration session or from a first portion of input speech that includes a wakeword. The reference speech may be encoded using a recurrent neural network (RNN) encoder to create a reference feature vector. The reference feature vector and incoming audio data may be processed by a trained neural network classifier to label the incoming audio data (for example, frame-by-frame) as to whether each frame is spoken by the same speaker as the reference speech. The labels may be passed to an automatic speech recognition (ASR) component which may allow the ASR component to focus its processing on the desired speech.

23.

发明授权
Anchored speech detection and speech recognition 有权

公开(公告)号：US10373612B2

公开(公告)日：2019-08-06

申请号：US15196228

申请日：2016-06-29

Applicant: Amazon Technologies, Inc.

Inventor： Sree Hari Krishnan Parthasarathi , Bjorn Hoffmeister , Brian King , Roland Maas

IPC: G10L15/02 , G10L15/08 , G10L15/16 , G10L15/20 , G10L17/02 , G10L17/06 , G10L17/18 , G10L25/78 , G10L25/87

Abstract: A system configured to process speech commands may classify incoming audio as desired speech, undesired speech, or non-speech. Desired speech is speech that is from a same speaker as reference speech. The reference speech may be obtained from a configuration session or from a first portion of input speech that includes a wakeword. The reference speech may be encoded using a recurrent neural network (RNN) encoder to create a reference feature vector. The reference feature vector and incoming audio data may be processed by a trained neural network classifier to label the incoming audio data (for example, frame-by-frame) as to whether each frame is spoken by the same speaker as the reference speech. The labels may be passed to an automatic speech recognition (ASR) component which may allow the ASR component to focus its processing on the desired speech.

24.

发明授权
Low latency and memory efficient keywork spotting 有权
Title translation: 低延迟和内存高效的密钥检测

公开(公告)号：US09390708B1

公开(公告)日：2016-07-12

申请号：US13903814

申请日：2013-05-28

Applicant: Amazon Technologies, Inc.

Inventor： Bjorn Hoffmeister

IPC: G10L15/00 , G10L15/04 , G10L15/28 , G10L21/00 , G10L25/00 , G10L21/06 , G10L15/14 , G10L15/02

CPC classification number: G10L15/02 , G10L15/08 , G10L15/142 , G10L15/22 , G10L2015/088 , G10L2015/223

Abstract: Features are disclosed for spotting keywords in utterance audio data without requiring the entire utterance to first be processed. Likelihoods that a portion of the utterance audio data corresponds to the keyword may be compared to likelihoods that the portion corresponds to background audio (e.g., general speech and/or non-speech sounds). The difference in the likelihoods may be determined, and keyword may be triggered when the difference exceeds a threshold, or shortly thereafter. Traceback information and other data may be stored during the process so that a second speech processing pass may be performed. For efficient management of system memory, traceback information may only be stored for those frames that may encompass a keyword; the traceback information for older frames may be overwritten by traceback information for newer frames.

Abstract translation: 公开了用于在话音音频数据中发现关键字的特征，而不需要首先处理整个话语。话音音频数据的一部分对应于该关键字的可能性可以与该部分对应于背景音频（例如，一般语音和/或非语音）的似然性进行比较。可以确定可能性的差异，并且当差异超过阈值时或之后不久可以触发关键字。跟踪信息和其他数据可以在该处理期间被存储，从而可以执行第二语音处理通行证。为了有效地管理系统存储器，回溯信息可能仅存储在可能包含关键字的那些帧中; 旧帧的追溯信息可能被较新帧的追溯信息覆盖。

25.

发明授权
Acoustic echo cancellation and automatic speech recognition with random noise 有权
Title translation: 声回波消除和随机噪声的自动语音识别

公开(公告)号：US09286883B1

公开(公告)日：2016-03-15

申请号：US14038319

申请日：2013-09-26

Applicant: Amazon Technologies, Inc.

Inventor： Robert Ayrapetian , Bjorn Hoffmeister

IPC: H04B3/20 , G10K11/178

CPC classification number: G10L21/0208 , G10L15/20 , G10L2021/02082 , H04M9/082

Abstract: Features are disclosed for performing acoustic echo cancellation using random noise. The output may be used to perform speech recognition. Random noise may be introduced into a reference signal path and into a microphone signal path. The random noise introduced into the microphone signal path may be transformed based on an estimated echo path and then combined with microphone output. The random noise introduced into the reference signal path may be combined with a reference signal and then transformed. In some embodiments, the random noise in the reference signal path may be used in the absence of another reference signal, allowing the acoustic echo canceler to be continuously trained.

Abstract translation: 公开了用于使用随机噪声执行声学回波消除的特征。该输出可用于执行语音识别。可将随机噪声引入参考信号路径并传入麦克风信号路径。引入麦克风信号路径的随机噪声可以基于估计的回波路径而变换，然后与麦克风输出组合。引入参考信号路径的随机噪声可以与参考信号组合，然后变换。在一些实施例中，参考信号路径中的随机噪声可以在不存在另一个参考信号的情况下使用，从而允许连续训练声学回波消除器。

26.

发明授权
Local speech recognition of frequent utterances 有权
Title translation: 频繁话语的本地语音识别

公开(公告)号：US09070367B1

公开(公告)日：2015-06-30

申请号：US13684969

申请日：2012-11-26

Applicant: Amazon Technologies, Inc.

Inventor： Bjorn Hoffmeister , Jeffrey O'Neill

IPC: G10L15/04 , G10L15/187

CPC classification number: G10L15/187 , G10L15/063 , G10L15/30

Abstract: In a distributed automated speech recognition (ASR) system, speech models may be employed on a local device to allow the local device to process frequently spoken utterances while passing other utterances to a remote device for processing. Upon receiving an audio signal, the local device compares the audio signal to the speech models of the frequently spoken utterances to determine whether the audio signal matches one of the speech models. When the audio signal matches one of the speech models, the local device processes the utterance, for example by executing a command. When the audio signal does not match one of the speech models, the local device transmits the audio signal to a second device for ASR processing. This reduces latency and the amount of audio signals that are sent to the second device for ASR processing.

Abstract translation: 在分布式自动语音识别（ASR）系统中，可以在本地设备上使用语音模型，以允许本地设备处理频繁的口语话语，同时将其他话语传送到远程设备进行处理。在接收到音频信号时，本地设备将音频信号与经常说出的话语的语音模型进行比较，以确定音频信号是否与语音模型中的一个匹配。当音频信号与语音模型中的一个匹配时，本地设备例如通过执行命令来处理话音。当音频信号与语音模型之一不匹配时，本地设备将音频信号发送到用于ASR处理的第二设备。这减少了延迟和发送到第二设备的ASR处理的音频信号量。

27.

发明授权
Language model adaptation 有权

公开(公告)号：US12014726B2

公开(公告)日：2024-06-18

申请号：US17706057

申请日：2022-03-28

Applicant: Amazon Technologies, Inc.

Inventor： Ankur Gandhe , Ariya Rastrow , Roland Maximilian Rolf Maas , Bjorn Hoffmeister

IPC: G10L15/22 , G10L15/01 , G10L15/06 , G10L15/065

CPC classification number: G10L15/065 , G10L15/01 , G10L15/063

Abstract: Exemplary embodiments relate to adapting a generic language model during runtime using domain-specific language model data. The system performs an audio frame-level analysis, to determine if the utterance corresponds to a particular domain and whether the ASR hypothesis needs to be rescored. The system processes, using a trained classifier, the ASR hypothesis (a partial hypothesis) generated for the audio data processed so far. The system determines whether to rescore the hypothesis after every few audio frames (representing a word in the utterance) are processed by the speech recognition system.

28.

发明授权
Deep multi-channel acoustic modeling using multiple microphone array geometries 有权

公开(公告)号：US11574628B1

公开(公告)日：2023-02-07

申请号：US16368331

申请日：2019-03-28

Applicant: Amazon Technologies, Inc.

Inventor： Kenichi Kumatani , Minhua Wu , Shiva Sundaram , Nikko Strom , Bjorn Hoffmeister

IPC: G10L15/16 , G10L25/30 , G10L15/02 , G06N3/08

Abstract: Techniques for speech processing using a deep neural network (DNN) based acoustic model front-end are described. A new modeling approach directly models multi-channel audio data received from a microphone array using a first model (e.g., multi-geometry/multi-channel DNN) that is trained using a plurality of microphone array geometries. Thus, the first model may receive a variable number of microphone channels, generate multiple outputs using multiple microphone array geometries, and select the best output as a first feature vector that may be used similarly to beamformed features generated by an acoustic beamformer. A second model (e.g., feature extraction DNN) processes the first feature vector and transforms it to a second feature vector having a lower dimensional representation. A third model (e.g., classification DNN) processes the second feature vector to perform acoustic unit classification and generate text data. The DNN front-end enables improved performance despite a reduction in microphones.

29.

发明授权
Device-directed utterance detection 有权

公开(公告)号：US11551685B2

公开(公告)日：2023-01-10

申请号：US16822744

申请日：2020-03-18

Applicant: Amazon Technologies, Inc.

Inventor： Ariya Rastrow , Eli Joshua Fidler , Roland Maximilian Rolf Maas , Nikko Strom , Aaron Eakin , Diamond Bishop , Bjorn Hoffmeister , Sanjeev Mishra

IPC: G10L15/22 , G10L15/18 , G10L15/26 , G10L15/08

Abstract: A speech interface device is configured to detect an interrupt event and process a voice command without detecting a wakeword. The device includes on-device interrupt architecture configured to detect when device-directed speech is present and send audio data to a remote system for speech processing. This architecture includes an interrupt detector that detects an interrupt event (e.g., device-directed speech) with low latency, enabling the device to quickly lower a volume of output audio and/or perform other actions in response to a potential voice command. In addition, the architecture includes a device directed classifier that processes an entire utterance and corresponding semantic information and detects device-directed speech with high accuracy. Using the device directed classifier, the device may reject the interrupt event and increase a volume of the output audio or may accept the interrupt event, causing the output audio to end and performing speech processing on the audio data.

30.

发明授权
Deep multi-channel acoustic modeling 有权

公开(公告)号：US11475881B2

公开(公告)日：2022-10-18

申请号：US16932049

申请日：2020-07-17

Applicant: Amazon Technologies, Inc.

Inventor： Arindam Mandal , Kenichi Kumatani , Nikko Strom , Minhua Wu , Shiva Sundaram , Bjorn Hoffmeister , Jeremie Lecomte

IPC: G10L15/16 , G10L15/22 , G06N3/08 , G10L15/06 , G10L15/30 , H04R3/00 , H04R1/40

Abstract: Techniques for speech processing using a deep neural network (DNN) based acoustic model front-end are described. A new modeling approach directly models multi-channel audio data received from a microphone array using a first model (e.g., multi-channel DNN) that takes in raw signals and produces a first feature vector that may be used similarly to beamformed features generated by an acoustic beamformer. A second model (e.g., feature extraction DNN) processes the first feature vector and transforms it to a second feature vector having a lower dimensional representation. A third model (e.g., classification DNN) processes the second feature vector to perform acoustic unit classification and generate text data. These three models may be jointly optimized for speech processing (as opposed to individually optimized for signal enhancement), enabling improved performance despite a reduction in microphones and a reduction in bandwidth consumption during real-time processing.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification