Patent search ap:("Amazon Technologies Page Inc.") AND inv:"Mehmet Umut Isik"

1.

发明授权
Joint noise and echo suppression for two-way audio communication enhancement 有权

公开(公告)号：US11924367B1

公开(公告)日：2024-03-05

申请号：US17668297

申请日：2022-02-09

Applicant: Amazon Technologies, Inc.

Inventor： Jean-Marc Valin , Karim Helwani , Srikanth Venkata Tenneti , Erfan Soltanmohammadi , Mehmet Umut Isik , Richard Newman , Michael Mark Goodwin , Arvindh Krishnaswamy

IPC: H04M3/00 , G10L21/0232 , G10L21/034 , G10L25/18 , H04S3/00 , G10L21/0208

CPC classification number: H04M3/002 , G10L21/0232 , G10L21/034 , G10L25/18 , H04S3/008 , G10L2021/02082 , H04S2400/01 , H04S2400/03

Abstract: Joint noise and echo suppression may be performed for enhancing two-way audio communications. Audio data is captured at a communication device and audio data transmitted to the communication device from another communication device are used as input features to a trained machine learning model that uses the transmitted audio data as a reference signal to eliminate residual echo in the captured audio data when also suppressing noise in the captured audio data.

2.

发明授权
Group masked autoencoder for anomaly detection 有权

公开(公告)号：US12205039B1

公开(公告)日：2025-01-21

申请号：US17087181

申请日：2020-11-02

Applicant: Amazon Technologies, Inc.

Inventor： Ritwik Giri , Srikanth Venkata Tenneti , Karim Helwani , Fangzhou Cheng , Mehmet Umut Isik , Arvindh Krishnaswamy

IPC: G06N3/088 , G10L25/03 , G10L25/51

Abstract: A group masked autoencoder may be implemented for anomaly detection. An autoencoder network model may be trained without supervision and applied to output an estimated joint probability distribution of normality for a group of frames of time series data. The estimated joint probability distribution may be used to determine an anomaly score for the time series data. An anomaly may be detected according to the anomaly score and a result that indicates a detected anomaly may be provided.

3.

发明授权
Ratio mask post-filtering for audio enhancement 有权

公开(公告)号：US11521637B1

公开(公告)日：2022-12-06

申请号：US17037498

申请日：2020-09-29

Applicant: Amazon Technologies, Inc.

Inventor： Jean-Marc Valin , Mehmet Umut Isik , Neerad Dilip Phansalkar , Ritwik Giri , Karim Helwani , Arvindh Krishnaswamy

IPC: G10L21/034 , G06F3/16 , G10L25/30

Abstract: Post-filtering may be performed for ratio masks as part of audio enhancement. Audio data may be received. A machine learning model may be applied to generate gain values for different spectrum bands of the audio data. The gain values may then be modified using an envelope post-filter according to a monotonically increasing function applied to the gain values to produce modified gain values used to generate an enhanced version of the audio data.

4.

发明授权
Speech enhancement machine learning model for estimation of reverberation in a multi-task learning framework 有权

公开(公告)号：US12014748B1

公开(公告)日：2024-06-18

申请号：US16988423

申请日：2020-08-07

Applicant: Amazon Technologies, Inc.

Inventor： Ritwik Giri , Mehmet Umut Isik , Neerad Dilip Phansalkar , Jean-Marc Valin , Karim Helwani , Arvindh Krishnaswamy

IPC: G10L21/0208 , G06N5/04 , G06N20/00 , G10L21/034

CPC classification number: G10L21/0208 , G06N5/04 , G06N20/00 , G10L21/034 , G10L2021/02082

Abstract: Techniques for training and using a machine learning model for estimation of reverberation in a multi-task learning framework are described. According to some embodiments, the multi-task learning framework improves the performance of the machine learning model by estimating the amount of reverberation present in an input audio recording as a secondary task to the primary task of generating a clean speech portion of the input audio recording. In one embodiment, a model architecture is selected that takes a noisy reverberant recording as an input and outputs an estimate of a clean (e.g., de-reverberated) signal, an estimate of noise (e.g., background noise), and an estimate of the reverb only portion, with the secondary task of estimating the reverb only portion acting as a regularizer that improves the machine learning model's performance in enhancing the reverberant (e.g., and noisy) input speech.

5.

发明授权
Multilingual speech translation with adaptive speech synthesis and adaptive physiognomy 有权

公开(公告)号：US11545134B1

公开(公告)日：2023-01-03

申请号：US16709792

申请日：2019-12-10

Applicant: Amazon Technologies, Inc.

Inventor： Marcello Federico , Robert Enyedi , Yaser Al-Onaizan , Roberto Barra-Chicote , Andrew Paul Breen , Ritwik Giri , Mehmet Umut Isik , Arvindh Krishnaswamy , Hassan Sawaf

IPC: G10L13/08 , G10L15/22 , G11B20/10 , G06F3/16 , G10L13/10 , G06F40/47 , G10L25/90 , G10L15/06 , G10L13/00 , G10L15/26 , G06V40/16

Abstract: Techniques for the generation of dubbed audio for an audio/video are described. An exemplary approach is to receive a request to generate dubbed speech for an audio/visual file; and in response to the request to: extract speech segments from an audio track of the audio/visual file associated with identified speakers; translate the extracted speech segments into a target language; determine a machine learning model per identified speaker, the trained machine learning models to be used to generate a spoken version of the translated, extracted speech segments based on the identified speaker; generate, per translated, extracted speech segment, a spoken version of the translated, extracted speech segments using a trained machine learning model that corresponds to the identified speaker of the translated, extracted speech segment and prosody information for the extracted speech segments; and replace the extracted speech segments from the audio track of the audio/visual file with the spoken versions spoken version of the translated, extracted speech segments to generate a modified audio track.

6.

发明授权
Real-time target speaker audio enhancement 有权

公开(公告)号：US12272371B1

公开(公告)日：2025-04-08

申请号：US17364805

申请日：2021-06-30

Applicant: Amazon Technologies, Inc.

Inventor： Ritwik Giri , Shrikant Venkataramani , Jean-Marc Valin , Mehmet Umut Isik , Arvindh Krishnaswamy

IPC: G06F17/00 , G06N20/00 , G10L21/013 , G10L21/0364 , G10L21/038

Abstract: Real-time audio enhancement for a target speaker may be performed. An embedding of a sample of speaker audio is created using a trained neural network that performs voice identification. The embedding is then concatenated with the input features of a trained machine learning model for audio enhancement. The audio enhancement model can recognize and enhance a target speaker's speech in a real-time implementation, as the embedding is in the same feature space of the audio enhancement model.

7.

发明授权
Prognostics and health management service 有权

公开(公告)号：US12175434B2

公开(公告)日：2024-12-24

申请号：US17039649

申请日：2020-09-30

Applicant: Amazon Technologies, Inc.

Inventor： Srikanth Venkata Tenneti , Arvindh Krishnaswamy , Karim Helwani , Mehmet Umut Isik , Ritwik Giri , Fangzhou Cheng , Aparna Pandey

IPC: G06Q10/20 , G06F16/21 , G06F16/906

Abstract: Systems, methods, and apparatuses for detecting anomalies using clusters are described. In some examples, a method includes receiving a request to perform anomaly detection using a plurality of clusters; receiving a data point; determining when the received data point is a part of one of the plurality of clusters utilizing a distance to centers of the one or more clusters, wherein: when the received data point is determined to belong to a normal cluster, assigning the received data point to the determined cluster, updating the cluster, and updating a history for the cluster, when the received data point is determined to belong to an anomalous cluster, raising an anomaly, updating the cluster, and updating a history for the cluster, and when the received data point is determined to not belong to any cluster, raising an anomaly.

8.

发明授权
Convolutional neural network with positional embeddings for audio processing 有权

公开(公告)号：US12008457B1

公开(公告)日：2024-06-11

申请号：US17037515

申请日：2020-09-29

Applicant: Amazon Technologies, Inc.

Inventor： Mehmet Umut Isik , Ritwik Giri , Neerad Dilip Phansalkar , Jean-Marc Valin , Karim Helwani , Arvindh Krishnaswamy

IPC: G06N3/045 , G06N3/082 , G10L15/16

CPC classification number: G06N3/045 , G06N3/082 , G10L15/16

Abstract: Audio processing may be performed with a convolutional neural network that includes positional embeddings. Audio data may be received at an audio processing system. A convolutional neural network that concatenates frequency-positional embeddings at an input layer may be used to process the audio data. A result of processing the audio data through the convolutional neural network may be used to perform an audio processing task.

9.

发明申请
PROGNOSTICS AND HEALTH MANAGEMENT SERVICE 有权

公开(公告)号：US20220101270A1

公开(公告)日：2022-03-31

申请号：US17039649

申请日：2020-09-30

Applicant: Amazon Technologies, Inc.

Inventor： Srikanth Venkata Tenneti , Arvindh Krishnaswamy , Karim Helwani , Mehmet Umut Isik , Ritwik Giri , Fangzhou Cheng , Aparna Pandey

IPC: G06Q10/00 , G06F16/906 , G06F16/21

Abstract: Systems, methods, and apparatuses for detecting anomalies using clusters are described. In some examples, a method includes receiving a request to perform anomaly detection using a plurality of clusters; receiving a data point; determining when the received data point is a part of one of the plurality of clusters utilizing a distance to centers of the one or more clusters, wherein: when the received data point is determined to belong to a normal cluster, assigning the received data point to the determined cluster, updating the cluster, and updating a history for the cluster, when the received data point is determined to belong to an anomalous cluster, raising an anomaly, updating the cluster, and updating a history for the cluster, and when the received data point is determined to not belong to any cluster, raising an anomaly.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification