- 专利标题: Low-latency multi-speaker speech recognition
-
申请号: US16534902申请日: 2019-08-07
-
公开(公告)号: US11475898B2公开(公告)日: 2022-10-18
- 发明人: Masood Delfarah , Ossama A. Abdelhamid , Kyuyeon Hwang , Donald R. McAllaster , Sabato Marco Siniscalchi
- 申请人: Apple Inc.
- 申请人地址: US CA Cupertino
- 专利权人: Apple Inc.
- 当前专利权人: Apple Inc.
- 当前专利权人地址: US CA Cupertino
- 代理机构: Dentons US LLP
- 主分类号: G10L17/00
- IPC分类号: G10L17/00 ; G10L17/02 ; G10L17/04 ; G10L15/20 ; G10L21/0272 ; G10L17/18
摘要:
Systems and processes for operating an intelligent automated assistant are provided. In one example, a method includes receiving mixed speech data representing utterances of a target speaker and utterances of one or more interfering audio sources. The method further includes obtaining a target speaker representation, which represents speech characteristics of the target speaker; and determining, using a learning network, probability distributions of phonetic elements directly from the mixed speech data. The inputs of the learning network include the mixed speech data and the target speaker representation. An output of the learning network includes the probability distributions of phonetic elements. The method further includes generating text corresponding to the utterances of the target speaker based on the probability distributions of the phonetic elements; and providing a response to the target speaker based on the text corresponding to the utterances of the target speaker.
信息查询