Patent search ap:("Amazon Technologies Page Inc.") AND inv:"Masahito Togami"

1.

发明公开
Multi-Talker Audio Stream Separation, Transcription and Diaraization 审中-公开

公开(公告)号：US20240096346A1

公开(公告)日：2024-03-21

申请号：US17850617

申请日：2022-06-27

Applicant: Amazon Technologies, Inc.

Inventor： Masahito Togami , Ritwik Giri , Michael Mark Goodwin , Arvindh . Krishnaswamy , Siddhartha Shankara Rao

IPC: G10L21/10 , G10L15/04 , G10L21/0208

CPC classification number: G10L21/10 , G10L15/04 , G10L21/0208

Abstract: A plurality of talker embedding vectors may be derived that correspond to a plurality of talkers in an input audio stream. Each talker embedding vector may represent respective voice characteristics of a respective talker. The talker embedding vectors may be generated based on, for example, a pre-enrollment process or a cluster-based embedding vector derivation process. A plurality of instances of a personalized noise suppression model may be executed on the input audio stream. Each instance of the personalized noise suppression model may employ a respective talker embedding vector. A plurality of single-talker audio streams may be generated by the plurality of instances of the personalized noise suppression model. A plurality of single-talker transcriptions may be generated based on the plurality of single-talker audio streams. The plurality of single-talker transcriptions may be merged into a multi-talker output transcription.

2.

发明授权
Real-time low-complexity stereo speech enhancement with spatial cue preservation 有权

公开(公告)号：US12167223B2

公开(公告)日：2024-12-10

申请号：US17810303

申请日：2022-06-30

Applicant: Amazon Technologies, Inc.

Inventor： Masahito Togami , Karim Helwani , Jean-Marc Valin , Michael Mark Goodwin

IPC: H04S7/00 , G10L21/0216 , H04S1/00

Abstract: Real-time low-complexity stereo speech enhancement with spatial cue preservation may be performed. A stereo speech enhancement system receives a stereo input signal (e.g., a left and right input signal). The stereo speech enhancement system estimates spatial cues for a target speaker and downmixes the stereo input signal into a monaural signal. A low-complexity model may then process the monaural signal to generate an enhanced monaural signal. The stereo speech enhancement system upmixes the enhanced monaural signal based on the estimated spatial cues for the target speaker, to generate an enhanced stereo output signal.

3.

发明公开
REAL-TIME LOW-COMPLEXITY STEREO SPEECH ENHANCEMENT WITH SPATIAL CUE PRESERVATION 审中-公开

公开(公告)号：US20240007817A1

公开(公告)日：2024-01-04

申请号：US17810303

申请日：2022-06-30

Applicant: Amazon Technologies, Inc.

Inventor： Masahito Togami , Karim Helwani , Jean-Marc Valin , Michael Mark Goodwin

IPC: H04S7/00 , H04S1/00 , G10L21/0216

CPC classification number: H04S7/303 , H04S1/007 , G10L21/0216 , H04S2400/03 , H04S2400/11 , H04S2400/15

Abstract: Real-time low-complexity stereo speech enhancement with spatial cue preservation may be performed. A stereo speech enhancement system receives a stereo input signal (e.g., a left and right input signal). The stereo speech enhancement system estimates spatial cues for a target speaker and downmixes the stereo input signal into a monaural signal. A low-complexity model may then process the monaural signal to generate an enhanced monaural signal. The stereo speech enhancement system upmixes the enhanced monaural signal based on the estimated spatial cues for the target speaker, to generate an enhanced stereo output signal.

Patent Agency Ranking