Low-latency multi-speaker speech recognition

发明授权

US11475898B2 Low-latency multi-speaker speech recognition 有权

请登陆查看更多内容

专利标题： Low-latency multi-speaker speech recognition
申请号： US16534902

申请日： 2019-08-07
公开(公告)号： US11475898B2

公开(公告)日： 2022-10-18
发明人: Masood Delfarah , Ossama A. Abdelhamid , Kyuyeon Hwang , Donald R. McAllaster , Sabato Marco Siniscalchi
申请人： Apple Inc.
申请人地址： US CA Cupertino
专利权人： Apple Inc.
当前专利权人： Apple Inc.
当前专利权人地址： US CA Cupertino
代理机构： Dentons US LLP
主分类号： G10L17/00
IPC分类号： G10L17/00 ; G10L17/02 ; G10L17/04 ; G10L15/20 ; G10L21/0272 ; G10L17/18

Low-latency multi-speaker speech recognition

摘要：

Systems and processes for operating an intelligent automated assistant are provided. In one example, a method includes receiving mixed speech data representing utterances of a target speaker and utterances of one or more interfering audio sources. The method further includes obtaining a target speaker representation, which represents speech characteristics of the target speaker; and determining, using a learning network, probability distributions of phonetic elements directly from the mixed speech data. The inputs of the learning network include the mixed speech data and the target speaker representation. An output of the learning network includes the probability distributions of phonetic elements. The method further includes generating text corresponding to the utterances of the target speaker based on the probability distributions of the phonetic elements; and providing a response to the target speaker based on the text corresponding to the utterances of the target speaker.

信息查询

Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L17/00	讲话者辨认或验证