Patent search ap:("Amazon Technologies Page Inc.") AND inv:"Minhua Wu"

1.

发明授权
Monophone-based background modeling for wakeword detection 有权

公开(公告)号：US10964315B1

公开(公告)日：2021-03-30

申请号：US15639330

申请日：2017-06-30

Applicant: Amazon Technologies, Inc.

Inventor： Minhua Wu , Sankaran Panchapagesan , Ming Sun , Shiv Naga Prasad Vitaladevuni , Bjorn Hoffmeister , Ryan Paul Thomas , Arindam Mandal

IPC: G10L15/22 , G10L15/14 , G10L15/02 , G10L15/16 , G10L25/30 , G10L15/08

Abstract: An approach to wakeword detection uses an explicit representation of non-wakeword speech in the form of subword (e.g., phonetic monophone) units that do not necessarily occur in the wakeword and that broadly represent general speech. These subword units are arranged in a “background” model, which at runtime essentially competes with the wakeword model such that a wakeword is less likely to be declare as occurring when the input matches that background model well. An HMM may be used with the model to locate possible occurrences of the wakeword. Features are determined from portions of the input corresponding to subword units of the wakeword detected using the HMM. A secondary classifier is then used to process the features to yield a decision of whether the wakeword occurred.

2.

发明授权
Deep multi-channel acoustic modeling 有权

公开(公告)号：US10726830B1

公开(公告)日：2020-07-28

申请号：US16143910

申请日：2018-09-27

Applicant: Amazon Technologies, Inc.

Inventor： Arindam Mandal , Kenichi Kumatani , Nikko Strom , Minhua Wu , Shiva Sundaram , Bjorn Hoffmeister , Jeremie Lecomte

IPC: G10L15/16 , G10L15/22 , G10L15/30 , G10L15/06 , G06N3/08 , H04R3/00 , H04R1/40

Abstract: Techniques for speech processing using a deep neural network (DNN) based acoustic model front-end are described. A new modeling approach directly models multi-channel audio data received from a microphone array using a first model (e.g., multi-channel DNN) that takes in raw signals and produces a first feature vector that may be used similarly to beamformed features generated by an acoustic beamformer. A second model (e.g., feature extraction DNN) processes the first feature vector and transforms it to a second feature vector having a lower dimensional representation. A third model (e.g., classification DNN) processes the second feature vector to perform acoustic unit classification and generate text data. These three models may be jointly optimized for speech processing (as opposed to individually optimized for signal enhancement), enabling improved performance despite a reduction in microphones and a reduction in bandwidth consumption during real-time processing.

3.

发明授权
Speech processing optimizations based on microphone array 有权

公开(公告)号：US10679621B1

公开(公告)日：2020-06-09

申请号：US15927764

申请日：2018-03-21

Applicant: Amazon Technologies, Inc.

Inventor： Shiva Kumar Sundaram , Minhua Wu , Anirudh Raju , Spyridon Matsoukas , Arindam Mandal , Kenichi Kumatani

IPC: G10L15/22 , G10L15/187 , G10L15/26 , G10L15/30 , H04R3/00 , G10L21/0208 , G06F40/40 , H04W4/02 , G10L21/0216 , G10L15/08

Abstract: Systems and methods for utilizing microphone array information for acoustic modeling are disclosed. Audio data may be received from a device having a microphone array configuration. Microphone configuration data may also be received that indicates the configuration of the microphone array. The microphone configuration data may be utilized as an input vector to an acoustic model, along with the audio data, to generate phoneme data. Additionally, the microphone configuration data may be utilized to train and/or generate acoustic models, select an acoustic model to perform speech recognition with, and/or to improve trigger sound detection.

4.

发明授权
Deep multi-channel acoustic modeling using multiple microphone array geometries 有权

公开(公告)号：US11574628B1

公开(公告)日：2023-02-07

申请号：US16368331

申请日：2019-03-28

Applicant: Amazon Technologies, Inc.

Inventor： Kenichi Kumatani , Minhua Wu , Shiva Sundaram , Nikko Strom , Bjorn Hoffmeister

IPC: G10L15/16 , G10L25/30 , G10L15/02 , G06N3/08

Abstract: Techniques for speech processing using a deep neural network (DNN) based acoustic model front-end are described. A new modeling approach directly models multi-channel audio data received from a microphone array using a first model (e.g., multi-geometry/multi-channel DNN) that is trained using a plurality of microphone array geometries. Thus, the first model may receive a variable number of microphone channels, generate multiple outputs using multiple microphone array geometries, and select the best output as a first feature vector that may be used similarly to beamformed features generated by an acoustic beamformer. A second model (e.g., feature extraction DNN) processes the first feature vector and transforms it to a second feature vector having a lower dimensional representation. A third model (e.g., classification DNN) processes the second feature vector to perform acoustic unit classification and generate text data. The DNN front-end enables improved performance despite a reduction in microphones.

5.

发明授权
Deep multi-channel acoustic modeling 有权

公开(公告)号：US11475881B2

公开(公告)日：2022-10-18

申请号：US16932049

申请日：2020-07-17

Applicant: Amazon Technologies, Inc.

Inventor： Arindam Mandal , Kenichi Kumatani , Nikko Strom , Minhua Wu , Shiva Sundaram , Bjorn Hoffmeister , Jeremie Lecomte

IPC: G10L15/16 , G10L15/22 , G06N3/08 , G10L15/06 , G10L15/30 , H04R3/00 , H04R1/40

Abstract: Techniques for speech processing using a deep neural network (DNN) based acoustic model front-end are described. A new modeling approach directly models multi-channel audio data received from a microphone array using a first model (e.g., multi-channel DNN) that takes in raw signals and produces a first feature vector that may be used similarly to beamformed features generated by an acoustic beamformer. A second model (e.g., feature extraction DNN) processes the first feature vector and transforms it to a second feature vector having a lower dimensional representation. A third model (e.g., classification DNN) processes the second feature vector to perform acoustic unit classification and generate text data. These three models may be jointly optimized for speech processing (as opposed to individually optimized for signal enhancement), enabling improved performance despite a reduction in microphones and a reduction in bandwidth consumption during real-time processing.

6.

发明授权
Speech processing optimizations based on microphone array 有权

公开(公告)号：US11935525B1

公开(公告)日：2024-03-19

申请号：US16895377

申请日：2020-06-08

Applicant: Amazon Technologies, Inc.

Inventor： Shiva Kumar Sundaram , Minhua Wu , Anirudh Raju , Spyridon Matsoukas , Arindam Mandal , Kenichi Kumatani

IPC: G10L15/22 , G06F40/40 , G10L15/187 , G10L15/26 , G10L15/30 , G10L21/0208 , H04R3/00 , G10L15/08 , G10L21/0216 , H04W4/02

CPC classification number: G10L15/22 , G06F40/40 , G10L15/187 , G10L15/26 , G10L15/30 , G10L21/0208 , H04R3/005 , G10L2015/088 , G10L2015/223 , G10L2021/02166 , H04W4/025

Abstract: Systems and methods for utilizing microphone array information for acoustic modeling are disclosed. Audio data may be received from a device having a microphone array configuration. Microphone configuration data may also be received that indicates the configuration of the microphone array. The microphone configuration data may be utilized as an input vector to an acoustic model, along with the audio data, to generate phoneme data. Additionally, the microphone configuration data may be utilized to train and/or generate acoustic models, select an acoustic model to perform speech recognition with, and/or to improve trigger sound detection.

7.

发明授权
Deep multi-channel acoustic modeling using frequency aligned network 有权

公开(公告)号：US11495215B1

公开(公告)日：2022-11-08

申请号：US16710811

申请日：2019-12-11

Applicant: Amazon Technologies, Inc.

Inventor： Minhua Wu , Shiva Sundaram , Tae Jin Park , Kenichi Kumatani

IPC: G10L21/00 , G10L15/16 , G10L15/06 , G06N3/04 , G10L21/0216 , G06N3/08

Abstract: Techniques for speech processing using a deep neural network (DNN) based acoustic model front-end are described. A new modeling approach directly models multi-channel audio data received from a microphone array using a first model (e.g., multi-geometry/multi-channel DNN) that includes a frequency aligned network (FAN) architecture. Thus, the first model may perform spatial filtering to generate a first feature vector by processing individual frequency bins separately, such that multiple frequency bins are not combined. The first feature vector may be used similarly to beamformed features generated by an acoustic beamformer. A second model (e.g., feature extraction DNN) processes the first feature vector and transforms it to a second feature vector having a lower dimensional representation. A third model (e.g., classification DNN) processes the second feature vector to perform acoustic unit classification and generate text data. The DNN front-end enables improved performance despite a reduction in microphones.

8.

发明申请
DEEP MULTI-CHANNEL ACOUSTIC MODELING 审中-公开

公开(公告)号：US20200349928A1

公开(公告)日：2020-11-05

申请号：US16932049

申请日：2020-07-17

Applicant: Amazon Technologies, Inc.

Inventor： Arindam Mandal , Kenichi Kumatani , Nikko Strom , Minhua Wu , Shiva Sundaram , Bjorn Hoffmeister , Jeremie Lecomte

IPC: G10L15/16 , G10L15/22 , G10L15/30 , G06N3/08 , H04R3/00 , G10L15/06 , H04R1/40

Abstract: Techniques for speech processing using a deep neural network (DNN) based acoustic model front-end are described. A new modeling approach directly models multi-channel audio data received from a microphone array using a first model (e.g., multi-channel DNN) that takes in raw signals and produces a first feature vector that may be used similarly to beamformed features generated by an acoustic beamformer. A second model (e.g., feature extraction DNN) processes the first feature vector and transforms it to a second feature vector having a lower dimensional representation. A third model (e.g., classification DNN) processes the second feature vector to perform acoustic unit classification and generate text data. These three models may be jointly optimized for speech processing (as opposed to individually optimized for signal enhancement), enabling improved performance despite a reduction in microphones and a reduction in bandwidth consumption during real-time processing.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification