Two-pass end to end speech recognition

发明授权

US12073824B2 Two-pass end to end speech recognition 有权

请登陆查看更多内容

专利标题： Two-pass end to end speech recognition
申请号： US17616135

申请日： 2020-12-03
公开(公告)号： US12073824B2

公开(公告)日： 2024-08-27
发明人: Tara N. Sainath , Yanzhang He , Bo Li , Arun Narayanan , Ruoming Pang , Antoine Jean Bruguier , Shuo-Yiin Chang , Wei Li
申请人： GOOGLE LLC
申请人地址： US CA Mountain View
专利权人： GOOGLE LLC
当前专利权人： GOOGLE LLC
当前专利权人地址： US CA Mountain View
代理机构： Gray Ice Higdon
国际申请： PCT/US2020/063012 2020.12.03
国际公布： WO2021/113443A 2021.06.10
进入国家日期： 2021-12-02
主分类号： G10L15/00
IPC分类号： G10L15/00 ; G06N3/08 ; G10L15/05 ; G10L15/06 ; G10L15/16 ; G10L15/22

摘要：

Two-pass automatic speech recognition (ASR) models can be used to perform streaming on-device ASR to generate a text representation of an utterance captured in audio data. Various implementations include a first-pass portion of the ASR model used to generate streaming candidate recognition(s) of an utterance captured in audio data. For example, the first-pass portion can include a recurrent neural network transformer (RNN-T) decoder. Various implementations include a second-pass portion of the ASR model used to revise the streaming candidate recognition(s) of the utterance and generate a text representation of the utterance. For example, the second-pass portion can include a listen attend spell (LAS) decoder. Various implementations include a shared encoder shared between the RNN-T decoder and the LAS decoder.

公开/授权文献

US20220238101A1 TWO-PASS END TO END SPEECH RECOGNITION 公开/授权日：2022-07-28

信息查询

Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L15/00	语音识别（G10L17/00优先）