Patent search ap:("GOOGLE LLC") AND inv:"Shuo-yiin Chang" Page 2

11.

发明公开
End-To-End Segmentation in a Two-Pass Cascaded Encoder Automatic Speech Recognition Model 审中-公开

公开(公告)号：US20240169981A1

公开(公告)日：2024-05-23

申请号：US18512110

申请日：2023-11-17

Applicant: Google LLC

Inventor： Wenqian Ronny Huang , Shuo-yiin Chang , Tara N. Sainath , Yanzhang He

IPC: G10L15/197 , G10L15/02 , G10L15/05 , G10L15/06 , G10L15/16

CPC classification number: G10L15/197 , G10L15/02 , G10L15/05 , G10L15/063 , G10L15/16 , G10L2015/025 , G10L15/22

Abstract: A unified end-to-end segmenter and two-pass automatic speech recognition (ASR) model includes a first encoder, a first decoder, a second encoder, and a second decoder. The first encoder is configured to receive a sequence of acoustic frames and generate a first higher order feature representation. The first decoder is configured to receive the first higher order feature representation and generate, at each of a plurality of output steps, a first probability distribution and an indication of whether the output step corresponds to an end of speech segment, and emit an end of speech timestamp. The second encoder is configured to receive the first higher order feature representation and the end of speech timestamp, and generate a second higher order feature representation. The second decoder is configured to receive the second higher order feature representation and generate a second probability distribution.

12.

发明授权
Joint endpointing and automatic speech recognition 有权

公开(公告)号：US11475880B2

公开(公告)日：2022-10-18

申请号：US16809403

申请日：2020-03-04

Applicant: Google LLC

Inventor： Shuo-yiin Chang , Rohit Prakash Prabhavalkar , Gabor Simko , Tara N. Sainath , Bo Li , Yangzhang He

IPC: G10L15/16 , G10L15/02 , G10L15/14 , G10L15/28 , G10L15/08

Abstract: A method includes receiving audio data of an utterance and processing the audio data to obtain, as output from a speech recognition model configured to jointly perform speech decoding and endpointing of utterances: partial speech recognition results for the utterance; and an endpoint indication indicating when the utterance has ended. While processing the audio data, the method also includes detecting, based on the endpoint indication, the end of the utterance. In response to detecting the end of the utterance, the method also includes terminating the processing of any subsequent audio data received after the end of the utterance was detected.

13.

发明申请
TWO-PASS END TO END SPEECH RECOGNITION 有权

公开(公告)号：US20220238101A1

公开(公告)日：2022-07-28

申请号：US17616135

申请日：2020-12-03

Applicant: GOOGLE LLC

Inventor： Tara N. Sainath , Yanzhang He , Bo Li , Arun Narayanan , Ruoming Pang , Antoine Jean Bruguier , Shuo-yiin Chang , Wei Li

IPC: G10L15/16 , G10L15/05 , G10L15/22 , G10L15/06 , G06N3/08

Abstract: Two-pass automatic speech recognition (ASR) models can be used to perform streaming on-device ASR to generate a text representation of an utterance captured in audio data. Various implementations include a first-pass portion of the ASR model used to generate streaming candidate recognition(s) of an utterance captured in audio data. For example, the first-pass portion can include a recurrent neural network transformer (RNN-T) decoder. Various implementations include a second-pass portion of the ASR model used to revise the streaming candidate recognition(s) of the utterance and generate a text representation of the utterance. For example, the second-pass portion can include a listen attend spell (LAS) decoder. Various implementations include a shared encoder shared between the RNN-T decoder and the LAS decoder.

14.

发明申请
Fast Emit Low-latency Streaming ASR with Sequence-level Emission Regularization 有权

公开(公告)号：US20220122586A1

公开(公告)日：2022-04-21

申请号：US17447285

申请日：2021-09-09

Applicant: Google LLC

Inventor： Jiahui Yu , Chung-cheng Chiu , Bo Li , Shuo-yiin Chang , Tara Sainath , Wei Han , Anmol Gulati , Yanzhang He , Arun Narayanan , Yonghui Wu , Ruoming Pang

IPC: G10L15/06 , G10L15/22 , G10L15/30 , G10L15/16

Abstract: A computer-implemented method of training a streaming speech recognition model that includes receiving, as input to the streaming speech recognition model, a sequence of acoustic frames. The streaming speech recognition model is configured to learn an alignment probability between the sequence of acoustic frames and an output sequence of vocabulary tokens. The vocabulary tokens include a plurality of label tokens and a blank token. At each output step, the method includes determining a first probability of emitting one of the label tokens and determining a second probability of emitting the blank token. The method also includes generating the alignment probability at a sequence level based on the first probability and the second probability. The method also includes applying a tuning parameter to the alignment probability at the sequence level to maximize the first probability of emitting one of the label tokens.

15.

发明授权
Language agnostic multilingual end-to-end streaming on-device ASR system 有权

公开(公告)号：US12183322B2

公开(公告)日：2024-12-31

申请号：US17934555

申请日：2022-09-22

Applicant: Google LLC

Inventor： Bo Li , Tara N. Sainath , Ruoming Pang , Shuo-yiin Chang , Qiumin Xu , Trevor Strohman , Vince Chen , Qiao Liang , Heguang Liu , Yanzhang He , Parisa Haghani , Sameer Bidichandani

IPC: G10L15/00 , G10L15/06 , G10L15/22 , G10L15/30

Abstract: A method includes receiving a sequence of acoustic frames characterizing one or more utterances as input to a multilingual automated speech recognition (ASR) model. The method also includes generating a higher order feature representation for a corresponding acoustic frame. The method also includes generating a hidden representation based on a sequence of non-blank symbols output by a final softmax layer. The method also includes generating a probability distribution over possible speech recognition hypotheses based on the hidden representation generated by the prediction network at each of the plurality of output steps and the higher order feature representation generated by the encoder at each of the plurality of output steps. The method also includes predicting an end of utterance (EOU) token at an end of each utterance. The method also includes classifying each acoustic frame as either speech, initial silence, intermediate silence, or final silence.

16.

发明申请
TWO-PASS END TO END SPEECH RECOGNITION 有权

公开(公告)号：US20240420687A1

公开(公告)日：2024-12-19

申请号：US18815537

申请日：2024-08-26

Applicant: GOOGLE LLC

Inventor： Tara N. Sainath , Yanzhang He , Bo Li , Arun Narayanan , Ruoming Pang , Antoine Jean Bruguier , Shuo-yiin Chang , Wei Li

IPC: G10L15/16 , G06N3/08 , G10L15/05 , G10L15/06 , G10L15/22

Abstract: Two-pass automatic speech recognition (ASR) models can be used to perform streaming on-device ASR to generate a text representation of an utterance captured in audio data. Various implementations include a first-pass portion of the ASR model used to generate streaming candidate recognition(s) of an utterance captured in audio data. For example, the first-pass portion can include a recurrent neural network transformer (RNN-T) decoder. Various implementations include a second-pass portion of the ASR model used to revise the streaming candidate recognition(s) of the utterance and generate a text representation of the utterance. For example, the second-pass portion can include a listen attend spell (LAS) decoder. Various implementations include a shared encoder shared between the RNN-T decoder and the LAS decoder.

17.

发明公开
Multi-Output Decoders for Multi-Task Learning of ASR and Auxiliary Tasks 审中-公开

公开(公告)号：US20240153495A1

公开(公告)日：2024-05-09

申请号：US18494984

申请日：2023-10-26

Applicant: Google LLC

Inventor： Weiran Wang , Ding Zhao , Shaojin Ding , Hao Zhang , Shuo-yiin Chang , David Johannes Rybach , Tara N. Sainath , Yanzhang He , Ian McGraw , Shankar Kumar

IPC: G10L15/06 , G06F40/284 , G10L15/26

CPC classification number: G10L15/063 , G06F40/284 , G10L15/26

Abstract: A method includes receiving a training dataset that includes one or more spoken training utterances for training an automatic speech recognition (ASR) model. Each spoken training utterance in the training dataset paired with a corresponding transcription and a corresponding target sequence of auxiliary tokens. For each spoken training utterance, the method includes generating a speech recognition hypothesis for a corresponding spoken training utterance, determining a speech recognition loss based on the speech recognition hypothesis and the corresponding transcription, generating a predicted auxiliary token for the corresponding spoken training utterance, and determining an auxiliary task loss based on the predicted auxiliary token and the corresponding target sequence of auxiliary tokens. The method also includes the ASR model jointly on the speech recognition loss and the auxiliary task loss determined for each spoken training utterance.

18.

发明公开
Universal Monolingual Output Layer for Multilingual Speech Recognition 审中-公开

公开(公告)号：US20240135923A1

公开(公告)日：2024-04-25

申请号：US18485271

申请日：2023-10-11

Applicant: Google LLC

Inventor： Chao Zhang , Bo Li , Tara N. Sainath , Trevor Strohman , Shuo-yiin Chang

IPC: G10L15/197 , G10L15/00 , G10L15/02

CPC classification number: G10L15/197 , G10L15/005 , G10L15/02

Abstract: A method includes receiving a sequence of acoustic frames as input to a multilingual automated speech recognition (ASR) model configured to recognize speech in a plurality of different supported languages and generating, by an audio encoder of the multilingual ASR, a higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The method also includes generating, by a language identification (LID) predictor of the multilingual ASR, a language prediction representation for a corresponding higher order feature representation. The method also includes generating, by a decoder of the multilingual ASR, a probability distribution over possible speech recognition results based on the corresponding higher order feature representation, a sequence of non-blank symbols, and a corresponding language prediction representation. The decoder includes monolingual output layer having a plurality of output nodes each sharing a plurality of language-specific wordpiece models.

19.

发明公开
Joint Segmenting and Automatic Speech Recognition 审中-公开

公开(公告)号：US20230343332A1

公开(公告)日：2023-10-26

申请号：US18304064

申请日：2023-04-20

Applicant: Google LLC

Inventor： Ronny Huang , Shuo-yiin Chang , David Rybach , Rohit Prakash Prabhavalkar , Tara N. Sainath , Cyril Allauzen , Charles Caleb Peyser , Zhiyun Lu

IPC: G10L15/04 , G10L25/93 , G10L15/197 , G10L15/06 , G10L15/22 , G10L15/02

CPC classification number: G10L15/197 , G10L15/02 , G10L15/04 , G10L15/063 , G10L15/22 , G10L25/93 , G10L2015/025 , G10L2025/932

Abstract: A joint segmenting and ASR model includes an encoder and decoder. The encoder configured to: receive a sequence of acoustic frames characterizing one or more utterances; and generate, at each output step, a higher order feature representation for a corresponding acoustic frame. The decoder configured to: receive the higher order feature representation and generate, at each output step: a probability distribution over possible speech recognition hypotheses, and an indication of whether the corresponding output step corresponds to an end of speech segment. The j oint segmenting and ASR model trained on a set of training samples, each training sample including: audio data characterizing a spoken utterance; and a corresponding transcription of the spoken utterance, the corresponding transcription having an end of speech segment ground truth token inserted into the corresponding transcription automatically based on a set of heuristic-based rules and exceptions applied to the training sample.

20.

发明公开
Intended Query Detection using E2E Modeling for continued Conversation 审中-公开

公开(公告)号：US20230335117A1

公开(公告)日：2023-10-19

申请号：US18186872

申请日：2023-03-20

Applicant: Google LLC

Inventor： Shuo-yiin Chang , Guru Prakash Arumugam , Zelin Wu , Tara N. Sainath , Bo LI , Qiao Liang , Adam Stambler , Shyam Upadhyay , Manaal Faruqui , Trevor Strohman

IPC: G10L15/16 , G10L15/22 , G10L15/06

CPC classification number: G10L15/16 , G10L15/22 , G10L15/063 , G10L2015/223

Abstract: A method includes receiving, as input to a speech recognition model, audio data corresponding to a spoken utterance. The method also includes performing, using the speech recognition model, speech recognition on the audio data by, at each of a plurality of time steps, encoding, using an audio encoder, the audio data corresponding to the spoken utterance into a corresponding audio encoding, and decoding, using a speech recognition joint network, the corresponding audio encoding into a probability distribution over possible output labels. At each of the plurality of time steps, the method also includes determining, using an intended query (IQ) joint network configured to receive a label history representation associated with a sequence of non-blank symbols output by a final softmax layer, an intended query decision indicating whether or not the spoken utterance includes a query intended for a digital assistant.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification