-
公开(公告)号:US12014748B1
公开(公告)日:2024-06-18
申请号:US16988423
申请日:2020-08-07
Applicant: Amazon Technologies, Inc.
Inventor: Ritwik Giri , Mehmet Umut Isik , Neerad Dilip Phansalkar , Jean-Marc Valin , Karim Helwani , Arvindh Krishnaswamy
IPC: G10L21/0208 , G06N5/04 , G06N20/00 , G10L21/034
CPC classification number: G10L21/0208 , G06N5/04 , G06N20/00 , G10L21/034 , G10L2021/02082
Abstract: Techniques for training and using a machine learning model for estimation of reverberation in a multi-task learning framework are described. According to some embodiments, the multi-task learning framework improves the performance of the machine learning model by estimating the amount of reverberation present in an input audio recording as a secondary task to the primary task of generating a clean speech portion of the input audio recording. In one embodiment, a model architecture is selected that takes a noisy reverberant recording as an input and outputs an estimate of a clean (e.g., de-reverberated) signal, an estimate of noise (e.g., background noise), and an estimate of the reverb only portion, with the secondary task of estimating the reverb only portion acting as a regularizer that improves the machine learning model's performance in enhancing the reverberant (e.g., and noisy) input speech.
-
公开(公告)号:US12008457B1
公开(公告)日:2024-06-11
申请号:US17037515
申请日:2020-09-29
Applicant: Amazon Technologies, Inc.
Inventor: Mehmet Umut Isik , Ritwik Giri , Neerad Dilip Phansalkar , Jean-Marc Valin , Karim Helwani , Arvindh Krishnaswamy
Abstract: Audio processing may be performed with a convolutional neural network that includes positional embeddings. Audio data may be received at an audio processing system. A convolutional neural network that concatenates frequency-positional embeddings at an input layer may be used to process the audio data. A result of processing the audio data through the convolutional neural network may be used to perform an audio processing task.
-
公开(公告)号:US11521637B1
公开(公告)日:2022-12-06
申请号:US17037498
申请日:2020-09-29
Applicant: Amazon Technologies, Inc.
Inventor: Jean-Marc Valin , Mehmet Umut Isik , Neerad Dilip Phansalkar , Ritwik Giri , Karim Helwani , Arvindh Krishnaswamy
IPC: G10L21/034 , G06F3/16 , G10L25/30
Abstract: Post-filtering may be performed for ratio masks as part of audio enhancement. Audio data may be received. A machine learning model may be applied to generate gain values for different spectrum bands of the audio data. The gain values may then be modified using an envelope post-filter according to a monotonically increasing function applied to the gain values to produce modified gain values used to generate an enhanced version of the audio data.
-
-