Patent search ap:("Amazon Technologies Page Inc.") AND inv:"Arvindh . Krishnaswamy"

1.

发明公开
Multi-Talker Audio Stream Separation, Transcription and Diaraization 审中-公开

公开(公告)号：US20240096346A1

公开(公告)日：2024-03-21

申请号：US17850617

申请日：2022-06-27

Applicant: Amazon Technologies, Inc.

Inventor： Masahito Togami , Ritwik Giri , Michael Mark Goodwin , Arvindh . Krishnaswamy , Siddhartha Shankara Rao

IPC: G10L21/10 , G10L15/04 , G10L21/0208

CPC classification number: G10L21/10 , G10L15/04 , G10L21/0208

Abstract: A plurality of talker embedding vectors may be derived that correspond to a plurality of talkers in an input audio stream. Each talker embedding vector may represent respective voice characteristics of a respective talker. The talker embedding vectors may be generated based on, for example, a pre-enrollment process or a cluster-based embedding vector derivation process. A plurality of instances of a personalized noise suppression model may be executed on the input audio stream. Each instance of the personalized noise suppression model may employ a respective talker embedding vector. A plurality of single-talker audio streams may be generated by the plurality of instances of the personalized noise suppression model. A plurality of single-talker transcriptions may be generated based on the plurality of single-talker audio streams. The plurality of single-talker transcriptions may be merged into a multi-talker output transcription.

Patent Agency Ranking