-
公开(公告)号:US12014748B1
公开(公告)日:2024-06-18
申请号:US16988423
申请日:2020-08-07
Applicant: Amazon Technologies, Inc.
Inventor: Ritwik Giri , Mehmet Umut Isik , Neerad Dilip Phansalkar , Jean-Marc Valin , Karim Helwani , Arvindh Krishnaswamy
IPC: G10L21/0208 , G06N5/04 , G06N20/00 , G10L21/034
CPC classification number: G10L21/0208 , G06N5/04 , G06N20/00 , G10L21/034 , G10L2021/02082
Abstract: Techniques for training and using a machine learning model for estimation of reverberation in a multi-task learning framework are described. According to some embodiments, the multi-task learning framework improves the performance of the machine learning model by estimating the amount of reverberation present in an input audio recording as a secondary task to the primary task of generating a clean speech portion of the input audio recording. In one embodiment, a model architecture is selected that takes a noisy reverberant recording as an input and outputs an estimate of a clean (e.g., de-reverberated) signal, an estimate of noise (e.g., background noise), and an estimate of the reverb only portion, with the secondary task of estimating the reverb only portion acting as a regularizer that improves the machine learning model's performance in enhancing the reverberant (e.g., and noisy) input speech.
-
公开(公告)号:US11521637B1
公开(公告)日:2022-12-06
申请号:US17037498
申请日:2020-09-29
Applicant: Amazon Technologies, Inc.
Inventor: Jean-Marc Valin , Mehmet Umut Isik , Neerad Dilip Phansalkar , Ritwik Giri , Karim Helwani , Arvindh Krishnaswamy
IPC: G10L21/034 , G06F3/16 , G10L25/30
Abstract: Post-filtering may be performed for ratio masks as part of audio enhancement. Audio data may be received. A machine learning model may be applied to generate gain values for different spectrum bands of the audio data. The gain values may then be modified using an envelope post-filter according to a monotonically increasing function applied to the gain values to produce modified gain values used to generate an enhanced version of the audio data.
-
公开(公告)号:US12167223B2
公开(公告)日:2024-12-10
申请号:US17810303
申请日:2022-06-30
Applicant: Amazon Technologies, Inc.
Inventor: Masahito Togami , Karim Helwani , Jean-Marc Valin , Michael Mark Goodwin
IPC: H04S7/00 , G10L21/0216 , H04S1/00
Abstract: Real-time low-complexity stereo speech enhancement with spatial cue preservation may be performed. A stereo speech enhancement system receives a stereo input signal (e.g., a left and right input signal). The stereo speech enhancement system estimates spatial cues for a target speaker and downmixes the stereo input signal into a monaural signal. A low-complexity model may then process the monaural signal to generate an enhanced monaural signal. The stereo speech enhancement system upmixes the enhanced monaural signal based on the estimated spatial cues for the target speaker, to generate an enhanced stereo output signal.
-
公开(公告)号:US11924367B1
公开(公告)日:2024-03-05
申请号:US17668297
申请日:2022-02-09
Applicant: Amazon Technologies, Inc.
Inventor: Jean-Marc Valin , Karim Helwani , Srikanth Venkata Tenneti , Erfan Soltanmohammadi , Mehmet Umut Isik , Richard Newman , Michael Mark Goodwin , Arvindh Krishnaswamy
IPC: H04M3/00 , G10L21/0232 , G10L21/034 , G10L25/18 , H04S3/00 , G10L21/0208
CPC classification number: H04M3/002 , G10L21/0232 , G10L21/034 , G10L25/18 , H04S3/008 , G10L2021/02082 , H04S2400/01 , H04S2400/03
Abstract: Joint noise and echo suppression may be performed for enhancing two-way audio communications. Audio data is captured at a communication device and audio data transmitted to the communication device from another communication device are used as input features to a trained machine learning model that uses the transmitted audio data as a reference signal to eliminate residual echo in the captured audio data when also suppressing noise in the captured audio data.
-
公开(公告)号:US20240007817A1
公开(公告)日:2024-01-04
申请号:US17810303
申请日:2022-06-30
Applicant: Amazon Technologies, Inc.
Inventor: Masahito Togami , Karim Helwani , Jean-Marc Valin , Michael Mark Goodwin
IPC: H04S7/00 , H04S1/00 , G10L21/0216
CPC classification number: H04S7/303 , H04S1/007 , G10L21/0216 , H04S2400/03 , H04S2400/11 , H04S2400/15
Abstract: Real-time low-complexity stereo speech enhancement with spatial cue preservation may be performed. A stereo speech enhancement system receives a stereo input signal (e.g., a left and right input signal). The stereo speech enhancement system estimates spatial cues for a target speaker and downmixes the stereo input signal into a monaural signal. A low-complexity model may then process the monaural signal to generate an enhanced monaural signal. The stereo speech enhancement system upmixes the enhanced monaural signal based on the estimated spatial cues for the target speaker, to generate an enhanced stereo output signal.
-
公开(公告)号:US12272371B1
公开(公告)日:2025-04-08
申请号:US17364805
申请日:2021-06-30
Applicant: Amazon Technologies, Inc.
Inventor: Ritwik Giri , Shrikant Venkataramani , Jean-Marc Valin , Mehmet Umut Isik , Arvindh Krishnaswamy
IPC: G06F17/00 , G06N20/00 , G10L21/013 , G10L21/0364 , G10L21/038
Abstract: Real-time audio enhancement for a target speaker may be performed. An embedding of a sample of speaker audio is created using a trained neural network that performs voice identification. The embedding is then concatenated with the input features of a trained machine learning model for audio enhancement. The audio enhancement model can recognize and enhance a target speaker's speech in a real-time implementation, as the embedding is in the same feature space of the audio enhancement model.
-
公开(公告)号:US20250111857A1
公开(公告)日:2025-04-03
申请号:US18478759
申请日:2023-09-29
Applicant: Amazon Technologies, Inc.
Inventor: Ritwik Giri , Zhepei Wang , Devansh Shah , Jean-Marc Valin , Michael Mark Goodwin
IPC: G10L21/0208 , G10L25/30 , H04M3/56
Abstract: Examples herein provide an approach to enhance an audio mixture of a teleconference application by switching between noise suppression modes using a single model. Specifically, a machine learning (ML) model may be configured to, in response to receiving an audio mixture representation as input, suppress either a background noise of the audio mixture or suppress all noise of the audio mixture except a user's voice. In some examples, the ML model may be trained on speech and background noise training data during a training phase. In addition, the ML model may be trained on a user's voice during an enrollment phase. In addition, during an inference phase, the ML model may enhance the audio mixture by suppressing a portion of the audio mixture.
-
公开(公告)号:US12008457B1
公开(公告)日:2024-06-11
申请号:US17037515
申请日:2020-09-29
Applicant: Amazon Technologies, Inc.
Inventor: Mehmet Umut Isik , Ritwik Giri , Neerad Dilip Phansalkar , Jean-Marc Valin , Karim Helwani , Arvindh Krishnaswamy
Abstract: Audio processing may be performed with a convolutional neural network that includes positional embeddings. Audio data may be received at an audio processing system. A convolutional neural network that concatenates frequency-positional embeddings at an input layer may be used to process the audio data. A result of processing the audio data through the convolutional neural network may be used to perform an audio processing task.
-
-
-
-
-
-
-