-
公开(公告)号:US20240096346A1
公开(公告)日:2024-03-21
申请号:US17850617
申请日:2022-06-27
Applicant: Amazon Technologies, Inc.
Inventor: Masahito Togami , Ritwik Giri , Michael Mark Goodwin , Arvindh . Krishnaswamy , Siddhartha Shankara Rao
IPC: G10L21/10 , G10L15/04 , G10L21/0208
CPC classification number: G10L21/10 , G10L15/04 , G10L21/0208
Abstract: A plurality of talker embedding vectors may be derived that correspond to a plurality of talkers in an input audio stream. Each talker embedding vector may represent respective voice characteristics of a respective talker. The talker embedding vectors may be generated based on, for example, a pre-enrollment process or a cluster-based embedding vector derivation process. A plurality of instances of a personalized noise suppression model may be executed on the input audio stream. Each instance of the personalized noise suppression model may employ a respective talker embedding vector. A plurality of single-talker audio streams may be generated by the plurality of instances of the personalized noise suppression model. A plurality of single-talker transcriptions may be generated based on the plurality of single-talker audio streams. The plurality of single-talker transcriptions may be merged into a multi-talker output transcription.
-
公开(公告)号:US12167223B2
公开(公告)日:2024-12-10
申请号:US17810303
申请日:2022-06-30
Applicant: Amazon Technologies, Inc.
Inventor: Masahito Togami , Karim Helwani , Jean-Marc Valin , Michael Mark Goodwin
IPC: H04S7/00 , G10L21/0216 , H04S1/00
Abstract: Real-time low-complexity stereo speech enhancement with spatial cue preservation may be performed. A stereo speech enhancement system receives a stereo input signal (e.g., a left and right input signal). The stereo speech enhancement system estimates spatial cues for a target speaker and downmixes the stereo input signal into a monaural signal. A low-complexity model may then process the monaural signal to generate an enhanced monaural signal. The stereo speech enhancement system upmixes the enhanced monaural signal based on the estimated spatial cues for the target speaker, to generate an enhanced stereo output signal.
-
公开(公告)号:US20240007817A1
公开(公告)日:2024-01-04
申请号:US17810303
申请日:2022-06-30
Applicant: Amazon Technologies, Inc.
Inventor: Masahito Togami , Karim Helwani , Jean-Marc Valin , Michael Mark Goodwin
IPC: H04S7/00 , H04S1/00 , G10L21/0216
CPC classification number: H04S7/303 , H04S1/007 , G10L21/0216 , H04S2400/03 , H04S2400/11 , H04S2400/15
Abstract: Real-time low-complexity stereo speech enhancement with spatial cue preservation may be performed. A stereo speech enhancement system receives a stereo input signal (e.g., a left and right input signal). The stereo speech enhancement system estimates spatial cues for a target speaker and downmixes the stereo input signal into a monaural signal. A low-complexity model may then process the monaural signal to generate an enhanced monaural signal. The stereo speech enhancement system upmixes the enhanced monaural signal based on the estimated spatial cues for the target speaker, to generate an enhanced stereo output signal.
-
-