-
公开(公告)号:US20240096346A1
公开(公告)日:2024-03-21
申请号:US17850617
申请日:2022-06-27
Applicant: Amazon Technologies, Inc.
Inventor: Masahito Togami , Ritwik Giri , Michael Mark Goodwin , Arvindh . Krishnaswamy , Siddhartha Shankara Rao
IPC: G10L21/10 , G10L15/04 , G10L21/0208
CPC classification number: G10L21/10 , G10L15/04 , G10L21/0208
Abstract: A plurality of talker embedding vectors may be derived that correspond to a plurality of talkers in an input audio stream. Each talker embedding vector may represent respective voice characteristics of a respective talker. The talker embedding vectors may be generated based on, for example, a pre-enrollment process or a cluster-based embedding vector derivation process. A plurality of instances of a personalized noise suppression model may be executed on the input audio stream. Each instance of the personalized noise suppression model may employ a respective talker embedding vector. A plurality of single-talker audio streams may be generated by the plurality of instances of the personalized noise suppression model. A plurality of single-talker transcriptions may be generated based on the plurality of single-talker audio streams. The plurality of single-talker transcriptions may be merged into a multi-talker output transcription.