- 专利标题: Reducing Streaming ASR Model Delay With Self Alignment
-
申请号: US17644377申请日: 2021-12-15
-
公开(公告)号: US20220310097A1公开(公告)日: 2022-09-29
- 发明人: Jaeyoung Kim , Han Lu , Anshuman Tripathi , Qian Zhang , Hasim Sak
- 申请人: Google LLC
- 申请人地址: US CA Mountain View
- 专利权人: Google LLC
- 当前专利权人: Google LLC
- 当前专利权人地址: US CA Mountain View
- 主分类号: G10L15/26
- IPC分类号: G10L15/26 ; G10L15/16
摘要:
A streaming speech recognition model includes an audio encoder configured to receive a sequence of acoustic frames and generate a higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The streaming speech recognition model also includes a label encoder configured to receive a sequence of non-blank symbols output by a final softmax layer and generate a dense representation. The streaming speech recognition model also includes a joint network configured to receive the higher order feature representation generated by the audio encoder and the dense representation generated by the label encoder and generate a probability distribution over possible speech recognition hypotheses. Here, the streaming speech recognition model is trained using self-alignment to reduce prediction delay by encouraging an alignment path that is one frame left from a reference forced-alignment frame.
公开/授权文献
- US12057124B2 Reducing streaming ASR model delay with self alignment 公开/授权日:2024-08-06
信息查询