-
公开(公告)号:US20200349928A1
公开(公告)日:2020-11-05
申请号:US16932049
申请日:2020-07-17
Applicant: Amazon Technologies, Inc.
Inventor: Arindam Mandal , Kenichi Kumatani , Nikko Strom , Minhua Wu , Shiva Sundaram , Bjorn Hoffmeister , Jeremie Lecomte
Abstract: Techniques for speech processing using a deep neural network (DNN) based acoustic model front-end are described. A new modeling approach directly models multi-channel audio data received from a microphone array using a first model (e.g., multi-channel DNN) that takes in raw signals and produces a first feature vector that may be used similarly to beamformed features generated by an acoustic beamformer. A second model (e.g., feature extraction DNN) processes the first feature vector and transforms it to a second feature vector having a lower dimensional representation. A third model (e.g., classification DNN) processes the second feature vector to perform acoustic unit classification and generate text data. These three models may be jointly optimized for speech processing (as opposed to individually optimized for signal enhancement), enabling improved performance despite a reduction in microphones and a reduction in bandwidth consumption during real-time processing.
-
公开(公告)号:US10726830B1
公开(公告)日:2020-07-28
申请号:US16143910
申请日:2018-09-27
Applicant: Amazon Technologies, Inc.
Inventor: Arindam Mandal , Kenichi Kumatani , Nikko Strom , Minhua Wu , Shiva Sundaram , Bjorn Hoffmeister , Jeremie Lecomte
Abstract: Techniques for speech processing using a deep neural network (DNN) based acoustic model front-end are described. A new modeling approach directly models multi-channel audio data received from a microphone array using a first model (e.g., multi-channel DNN) that takes in raw signals and produces a first feature vector that may be used similarly to beamformed features generated by an acoustic beamformer. A second model (e.g., feature extraction DNN) processes the first feature vector and transforms it to a second feature vector having a lower dimensional representation. A third model (e.g., classification DNN) processes the second feature vector to perform acoustic unit classification and generate text data. These three models may be jointly optimized for speech processing (as opposed to individually optimized for signal enhancement), enabling improved performance despite a reduction in microphones and a reduction in bandwidth consumption during real-time processing.
-
公开(公告)号:US11475881B2
公开(公告)日:2022-10-18
申请号:US16932049
申请日:2020-07-17
Applicant: Amazon Technologies, Inc.
Inventor: Arindam Mandal , Kenichi Kumatani , Nikko Strom , Minhua Wu , Shiva Sundaram , Bjorn Hoffmeister , Jeremie Lecomte
Abstract: Techniques for speech processing using a deep neural network (DNN) based acoustic model front-end are described. A new modeling approach directly models multi-channel audio data received from a microphone array using a first model (e.g., multi-channel DNN) that takes in raw signals and produces a first feature vector that may be used similarly to beamformed features generated by an acoustic beamformer. A second model (e.g., feature extraction DNN) processes the first feature vector and transforms it to a second feature vector having a lower dimensional representation. A third model (e.g., classification DNN) processes the second feature vector to perform acoustic unit classification and generate text data. These three models may be jointly optimized for speech processing (as opposed to individually optimized for signal enhancement), enabling improved performance despite a reduction in microphones and a reduction in bandwidth consumption during real-time processing.
-
-