Patent search ap:("Amazon Technologies Page Inc.") AND inv:"Shiva Sundaram"

1.

发明授权
Deep multi-channel acoustic modeling 有权

公开(公告)号：US10726830B1

公开(公告)日：2020-07-28

申请号：US16143910

申请日：2018-09-27

Applicant: Amazon Technologies, Inc.

Inventor： Arindam Mandal , Kenichi Kumatani , Nikko Strom , Minhua Wu , Shiva Sundaram , Bjorn Hoffmeister , Jeremie Lecomte

IPC: G10L15/16 , G10L15/22 , G10L15/30 , G10L15/06 , G06N3/08 , H04R3/00 , H04R1/40

Abstract: Techniques for speech processing using a deep neural network (DNN) based acoustic model front-end are described. A new modeling approach directly models multi-channel audio data received from a microphone array using a first model (e.g., multi-channel DNN) that takes in raw signals and produces a first feature vector that may be used similarly to beamformed features generated by an acoustic beamformer. A second model (e.g., feature extraction DNN) processes the first feature vector and transforms it to a second feature vector having a lower dimensional representation. A third model (e.g., classification DNN) processes the second feature vector to perform acoustic unit classification and generate text data. These three models may be jointly optimized for speech processing (as opposed to individually optimized for signal enhancement), enabling improved performance despite a reduction in microphones and a reduction in bandwidth consumption during real-time processing.

2.

发明授权
Deep multi-channel acoustic modeling using multiple microphone array geometries 有权

公开(公告)号：US11574628B1

公开(公告)日：2023-02-07

申请号：US16368331

申请日：2019-03-28

Applicant: Amazon Technologies, Inc.

Inventor： Kenichi Kumatani , Minhua Wu , Shiva Sundaram , Nikko Strom , Bjorn Hoffmeister

IPC: G10L15/16 , G10L25/30 , G10L15/02 , G06N3/08

Abstract: Techniques for speech processing using a deep neural network (DNN) based acoustic model front-end are described. A new modeling approach directly models multi-channel audio data received from a microphone array using a first model (e.g., multi-geometry/multi-channel DNN) that is trained using a plurality of microphone array geometries. Thus, the first model may receive a variable number of microphone channels, generate multiple outputs using multiple microphone array geometries, and select the best output as a first feature vector that may be used similarly to beamformed features generated by an acoustic beamformer. A second model (e.g., feature extraction DNN) processes the first feature vector and transforms it to a second feature vector having a lower dimensional representation. A third model (e.g., classification DNN) processes the second feature vector to perform acoustic unit classification and generate text data. The DNN front-end enables improved performance despite a reduction in microphones.

3.

发明授权
Deep multi-channel acoustic modeling 有权

公开(公告)号：US11475881B2

公开(公告)日：2022-10-18

申请号：US16932049

申请日：2020-07-17

Applicant: Amazon Technologies, Inc.

Inventor： Arindam Mandal , Kenichi Kumatani , Nikko Strom , Minhua Wu , Shiva Sundaram , Bjorn Hoffmeister , Jeremie Lecomte

IPC: G10L15/16 , G10L15/22 , G06N3/08 , G10L15/06 , G10L15/30 , H04R3/00 , H04R1/40

Abstract: Techniques for speech processing using a deep neural network (DNN) based acoustic model front-end are described. A new modeling approach directly models multi-channel audio data received from a microphone array using a first model (e.g., multi-channel DNN) that takes in raw signals and produces a first feature vector that may be used similarly to beamformed features generated by an acoustic beamformer. A second model (e.g., feature extraction DNN) processes the first feature vector and transforms it to a second feature vector having a lower dimensional representation. A third model (e.g., classification DNN) processes the second feature vector to perform acoustic unit classification and generate text data. These three models may be jointly optimized for speech processing (as opposed to individually optimized for signal enhancement), enabling improved performance despite a reduction in microphones and a reduction in bandwidth consumption during real-time processing.

4.

发明授权
Method and system for beam selection in microphone array beamformers 有权

公开(公告)号：US09837099B1

公开(公告)日：2017-12-05

申请号：US15250659

申请日：2016-08-29

Applicant: Amazon Technologies, Inc.

Inventor： Shiva Sundaram , Amit Singh Chhetri , Ramya Gopalan , Philip Ryan Hilmes

IPC: H04R3/00 , G10L21/028 , G10L25/84 , G10L25/72 , G10L21/0216

CPC classification number: G10L21/028 , G10L25/72 , G10L25/84 , G10L2021/02166 , H04R1/406 , H04R3/005 , H04R25/405 , H04R25/407 , H04R2430/23

Abstract: Embodiments of systems and methods are described for determining which of a plurality of beamformed audio signals to select for signal processing. In some embodiments, a plurality of audio input signals are received from a microphone array comprising a plurality of microphones. A plurality of beamformed audio signals are determined based on the plurality of input audio signals, the beamformed audio signals comprising a direction. A plurality of signal features may be determined for each beamformed audio signal. Smoothed features may be determined for each beamformed audio signal based on at least a portion of the plurality of signal features. The beamformed audio signal corresponding to the maximum smoothed feature may be selected for further processing.

5.

发明授权
Feedback based beamformed signal selection 有权

公开(公告)号：US09734822B1

公开(公告)日：2017-08-15

申请号：US14727504

申请日：2015-06-01

Applicant: Amazon Technologies, Inc.

Inventor： Shiva Sundaram , Ramya Gopalan

IPC: G10L17/22 , H04R1/08 , G10L15/08 , G10L15/02 , G10L21/00 , G10L15/04

CPC classification number: G10L15/08 , G10L17/22 , H04R1/08 , H04R3/005 , H04R2430/00 , H04R2430/20

Abstract: Features are disclosed for improving the accuracy and stability of beamformed signal selection. The selection may consider processing feedback information to identify when the current beam selection may need to be re-evaluated. The feedback information may further be used to select a beamformed signal for processing. For example, beams which detect wake-words or yield high confidence speech recognition may be favored over beams which fail to detect or recognize at a lower confidence level.

6.

发明授权
Deep multi-channel acoustic modeling using frequency aligned network 有权

公开(公告)号：US11495215B1

公开(公告)日：2022-11-08

申请号：US16710811

申请日：2019-12-11

Applicant: Amazon Technologies, Inc.

Inventor： Minhua Wu , Shiva Sundaram , Tae Jin Park , Kenichi Kumatani

IPC: G10L21/00 , G10L15/16 , G10L15/06 , G06N3/04 , G10L21/0216 , G06N3/08

Abstract: Techniques for speech processing using a deep neural network (DNN) based acoustic model front-end are described. A new modeling approach directly models multi-channel audio data received from a microphone array using a first model (e.g., multi-geometry/multi-channel DNN) that includes a frequency aligned network (FAN) architecture. Thus, the first model may perform spatial filtering to generate a first feature vector by processing individual frequency bins separately, such that multiple frequency bins are not combined. The first feature vector may be used similarly to beamformed features generated by an acoustic beamformer. A second model (e.g., feature extraction DNN) processes the first feature vector and transforms it to a second feature vector having a lower dimensional representation. A third model (e.g., classification DNN) processes the second feature vector to perform acoustic unit classification and generate text data. The DNN front-end enables improved performance despite a reduction in microphones.

7.

发明申请
DEEP MULTI-CHANNEL ACOUSTIC MODELING 审中-公开

公开(公告)号：US20200349928A1

公开(公告)日：2020-11-05

申请号：US16932049

申请日：2020-07-17

Applicant: Amazon Technologies, Inc.

Inventor： Arindam Mandal , Kenichi Kumatani , Nikko Strom , Minhua Wu , Shiva Sundaram , Bjorn Hoffmeister , Jeremie Lecomte

IPC: G10L15/16 , G10L15/22 , G10L15/30 , G06N3/08 , H04R3/00 , G10L15/06 , H04R1/40

Abstract: Techniques for speech processing using a deep neural network (DNN) based acoustic model front-end are described. A new modeling approach directly models multi-channel audio data received from a microphone array using a first model (e.g., multi-channel DNN) that takes in raw signals and produces a first feature vector that may be used similarly to beamformed features generated by an acoustic beamformer. A second model (e.g., feature extraction DNN) processes the first feature vector and transforms it to a second feature vector having a lower dimensional representation. A third model (e.g., classification DNN) processes the second feature vector to perform acoustic unit classification and generate text data. These three models may be jointly optimized for speech processing (as opposed to individually optimized for signal enhancement), enabling improved performance despite a reduction in microphones and a reduction in bandwidth consumption during real-time processing.

8.

发明授权
Method and system for beam selection in microphone array beamformers 有权
Title translation: 麦克风阵列波束形成器中波束选择的方法和系统

公开(公告)号：US09432769B1

公开(公告)日：2016-08-30

申请号：US14447498

申请日：2014-07-30

Applicant: Amazon Technologies, Inc.

Inventor： Shiva Sundaram , Amit Singh Chhetri , Ramya Gopalan , Philip Ryan Hilmes

IPC: H04R3/00

CPC classification number: G10L21/028 , G10L25/72 , G10L25/84 , G10L2021/02166 , H04R1/406 , H04R3/005 , H04R25/405 , H04R25/407 , H04R2430/23

Abstract: Embodiments of systems and methods are described for determining which of a plurality of beamformed audio signals to select for signal processing. In some embodiments, a plurality of audio input signals are received from a microphone array comprising a plurality of microphones. A plurality of beamformed audio signals are determined based on the plurality of input audio signals, the beamformed audio signals comprising a direction. A plurality of signal features may be determined for each beamformed audio signal. Smoothed features may be determined for each beamformed audio signal based on at least a portion of the plurality of signal features. The beamformed audio signal corresponding to the maximum smoothed feature may be selected for further processing.

Abstract translation: 描述了系统和方法的实施例，用于确定多个波束形成的音频信号中的哪一个被选择用于信号处理。在一些实施例中，从包括多个麦克风的麦克风阵列接收多个音频输入信号。基于多个输入音频信号来确定多个波束形成的音频信号，波束形成的音频信号包括方向。可以为每个波束形成的音频信号确定多个信号特征。基于多个信号特征的至少一部分，可以为每个波束形成的音频信号确定平滑特征。可以选择对应于最大平滑特征的波束形成的音频信号用于进一步处理。

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification