Patent search ap:("Amazon Technologies Page Inc.") AND inv:"Neerad Dilip Phansalkar"

1.

发明授权
Speech enhancement machine learning model for estimation of reverberation in a multi-task learning framework 有权

公开(公告)号：US12014748B1

公开(公告)日：2024-06-18

申请号：US16988423

申请日：2020-08-07

Applicant: Amazon Technologies, Inc.

Inventor： Ritwik Giri , Mehmet Umut Isik , Neerad Dilip Phansalkar , Jean-Marc Valin , Karim Helwani , Arvindh Krishnaswamy

IPC: G10L21/0208 , G06N5/04 , G06N20/00 , G10L21/034

CPC classification number: G10L21/0208 , G06N5/04 , G06N20/00 , G10L21/034 , G10L2021/02082

Abstract: Techniques for training and using a machine learning model for estimation of reverberation in a multi-task learning framework are described. According to some embodiments, the multi-task learning framework improves the performance of the machine learning model by estimating the amount of reverberation present in an input audio recording as a secondary task to the primary task of generating a clean speech portion of the input audio recording. In one embodiment, a model architecture is selected that takes a noisy reverberant recording as an input and outputs an estimate of a clean (e.g., de-reverberated) signal, an estimate of noise (e.g., background noise), and an estimate of the reverb only portion, with the secondary task of estimating the reverb only portion acting as a regularizer that improves the machine learning model's performance in enhancing the reverberant (e.g., and noisy) input speech.

2.

发明授权
Convolutional neural network with positional embeddings for audio processing 有权

公开(公告)号：US12008457B1

公开(公告)日：2024-06-11

申请号：US17037515

申请日：2020-09-29

Applicant: Amazon Technologies, Inc.

Inventor： Mehmet Umut Isik , Ritwik Giri , Neerad Dilip Phansalkar , Jean-Marc Valin , Karim Helwani , Arvindh Krishnaswamy

IPC: G06N3/045 , G06N3/082 , G10L15/16

CPC classification number: G06N3/045 , G06N3/082 , G10L15/16

Abstract: Audio processing may be performed with a convolutional neural network that includes positional embeddings. Audio data may be received at an audio processing system. A convolutional neural network that concatenates frequency-positional embeddings at an input layer may be used to process the audio data. A result of processing the audio data through the convolutional neural network may be used to perform an audio processing task.

3.

发明授权
Ratio mask post-filtering for audio enhancement 有权

公开(公告)号：US11521637B1

公开(公告)日：2022-12-06

申请号：US17037498

申请日：2020-09-29

Applicant: Amazon Technologies, Inc.

Inventor： Jean-Marc Valin , Mehmet Umut Isik , Neerad Dilip Phansalkar , Ritwik Giri , Karim Helwani , Arvindh Krishnaswamy

IPC: G10L21/034 , G06F3/16 , G10L25/30

Abstract: Post-filtering may be performed for ratio masks as part of audio enhancement. Audio data may be received. A machine learning model may be applied to generate gain values for different spectrum bands of the audio data. The gain values may then be modified using an envelope post-filter according to a monotonically increasing function applied to the gain values to produce modified gain values used to generate an enhanced version of the audio data.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification