Multi-Talker Audio Stream Separation, Transcription and Diaraization

    公开(公告)号:US20240096346A1

    公开(公告)日:2024-03-21

    申请号:US17850617

    申请日:2022-06-27

    CPC classification number: G10L21/10 G10L15/04 G10L21/0208

    Abstract: A plurality of talker embedding vectors may be derived that correspond to a plurality of talkers in an input audio stream. Each talker embedding vector may represent respective voice characteristics of a respective talker. The talker embedding vectors may be generated based on, for example, a pre-enrollment process or a cluster-based embedding vector derivation process. A plurality of instances of a personalized noise suppression model may be executed on the input audio stream. Each instance of the personalized noise suppression model may employ a respective talker embedding vector. A plurality of single-talker audio streams may be generated by the plurality of instances of the personalized noise suppression model. A plurality of single-talker transcriptions may be generated based on the plurality of single-talker audio streams. The plurality of single-talker transcriptions may be merged into a multi-talker output transcription.

Patent Agency Ranking