Fusion of acoustic and text representations in RNN-T

Invention Grant

US12211509B2 Fusion of acoustic and text representations in RNN-T 有权

Please log in to see more content

Patent Title: Fusion of acoustic and text representations in RNN-T
Application No.: US17821160

Application Date: 2022-08-19
Publication No.: US12211509B2

Publication Date: 2025-01-28
Inventor: Chao Zhang , Bo Li , Zhiyun Lu , Tara N. Sainath , Shuo-yiin Chang
Applicant: Google LLC
Applicant Address: US CA Mountain View
Assignee: Google LLC
Current Assignee: Google LLC
Current Assignee Address: US CA Mountain View
Agency: Honigman LLP
Agent Brett A. Krueger; Grant Griffith
Main IPC: G10L15/30
IPC: G10L15/30 ; G06N7/01

Fusion of acoustic and text representations in RNN-T

Abstract:

A speech recognition model includes an encoder network, a prediction network, and a joint network. The encoder network is configured to receive a sequence of acoustic frames characterizing an input utterance; and generate, at each of a plurality of output steps, a higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The prediction network is configured to: receive a sequence of non-blank symbols output by a final Softmax layer; and generate, at each of the plurality of output steps, a dense representation. The joint network is configured to generate, at each of the plurality of output steps based on the higher order feature representation and the dense representation, a probability distribution over possible speech recognition hypotheses. The joint network includes a stack of gating and bilinear pooling to fuse the dense representation and the higher order feature representation.

Information query

Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L15/00	语音识别（G10L17/00优先）
G10L15/28	.语音识别系统的结构细节
G10L15/30	..分布式识别，例如：客户端-服务器系统，为移动电话或网络应用