Invention Grant
- Patent Title: Fusion of acoustic and text representations in RNN-T
-
Application No.: US17821160Application Date: 2022-08-19
-
Publication No.: US12211509B2Publication Date: 2025-01-28
- Inventor: Chao Zhang , Bo Li , Zhiyun Lu , Tara N. Sainath , Shuo-yiin Chang
- Applicant: Google LLC
- Applicant Address: US CA Mountain View
- Assignee: Google LLC
- Current Assignee: Google LLC
- Current Assignee Address: US CA Mountain View
- Agency: Honigman LLP
- Agent Brett A. Krueger; Grant Griffith
- Main IPC: G10L15/30
- IPC: G10L15/30 ; G06N7/01

Abstract:
A speech recognition model includes an encoder network, a prediction network, and a joint network. The encoder network is configured to receive a sequence of acoustic frames characterizing an input utterance; and generate, at each of a plurality of output steps, a higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The prediction network is configured to: receive a sequence of non-blank symbols output by a final Softmax layer; and generate, at each of the plurality of output steps, a dense representation. The joint network is configured to generate, at each of the plurality of output steps based on the higher order feature representation and the dense representation, a probability distribution over possible speech recognition hypotheses. The joint network includes a stack of gating and bilinear pooling to fuse the dense representation and the higher order feature representation.
Information query