-
公开(公告)号:US20230326461A1
公开(公告)日:2023-10-12
申请号:US18182925
申请日:2023-03-13
Applicant: Google LLC
Inventor: Shaojin Ding , Yangzhang He , Xin Wang , Weiran Wang , Trevor Strohman , Tara N. Sainath , Rohit Parkash Prabhavalkar , Robert David , Rina Panigrahy , Rami Botros , Qiao Liang , Ian Mcgraw , Ding Zhao , Dongseong Hwang
CPC classification number: G10L15/32 , G10L15/16 , G10L15/22 , G10L2015/223
Abstract: An automated speech recognition (ASR) model includes a first encoder, a first encoder, a second encoder, and a second decoder. The first encoder receives, as input, a sequence of acoustic frames, and generates, at each of a plurality of output steps, a first higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The first decoder receives, as input, the first higher order feature representation generated by the first encoder, and generates a first probability distribution over possible speech recognition hypotheses. The second encoder receives, as input, the first higher order feature representation generated by the first encoder, and generates a second higher order feature representation for a corresponding first higher order feature frame. The second decoder receives, as input, the second higher order feature representation generated by the second encoder, and generates a second probability distribution over possible speech recognition hypotheses.