Reducing Streaming ASR Model Delay With Self Alignment

发明申请

US20220310097A1 Reducing Streaming ASR Model Delay With Self Alignment 有权

请登陆查看更多内容

专利标题： Reducing Streaming ASR Model Delay With Self Alignment
申请号： US17644377

申请日： 2021-12-15
公开(公告)号： US20220310097A1

公开(公告)日： 2022-09-29
发明人: Jaeyoung Kim , Han Lu , Anshuman Tripathi , Qian Zhang , Hasim Sak
申请人： Google LLC
申请人地址： US CA Mountain View
专利权人： Google LLC
当前专利权人： Google LLC
当前专利权人地址： US CA Mountain View
主分类号： G10L15/26
IPC分类号： G10L15/26 ; G10L15/16

Reducing Streaming ASR Model Delay With Self Alignment

摘要：

A streaming speech recognition model includes an audio encoder configured to receive a sequence of acoustic frames and generate a higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The streaming speech recognition model also includes a label encoder configured to receive a sequence of non-blank symbols output by a final softmax layer and generate a dense representation. The streaming speech recognition model also includes a joint network configured to receive the higher order feature representation generated by the audio encoder and the dense representation generated by the label encoder and generate a probability distribution over possible speech recognition hypotheses. Here, the streaming speech recognition model is trained using self-alignment to reduce prediction delay by encouraging an alignment path that is one frame left from a reference forced-alignment frame.

公开/授权文献

US12057124B2 Reducing streaming ASR model delay with self alignment 公开/授权日：2024-08-06

信息查询

Global Dossier Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L15/00	语音识别（G10L17/00优先）
G10L15/26	.语音—正文识别系统（G10L15/08优先）