Patent search ap:("GOOGLE LLC") AND inv:"Yanzhang He" Page 2

11.

发明授权
Key phrase spotting 有权

公开(公告)号：US11948570B2

公开(公告)日：2024-04-02

申请号：US17654195

申请日：2022-03-09

Applicant: Google LLC

Inventor： Wei Li , Rohit Prakash Prabhavalkar , Kanury Kanishka Rao , Yanzhang He , Ian C. Mcgraw , Anton Bakhtin

IPC: G10L15/06 , G10L15/02 , G10L15/16 , G10L15/18 , G10L15/22 , G10L19/00 , G10L15/08 , G10L15/14

CPC classification number: G10L15/22 , G10L15/02 , G10L15/063 , G10L15/18 , G10L19/00 , G10L2015/025 , G10L2015/088 , G10L15/142 , G10L2015/223

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for detecting utterances of a key phrase in an audio signal. One of the methods includes receiving, by a key phrase spotting system, an audio signal encoding one or more utterances; while continuing to receive the audio signal, generating, by the key phrase spotting system, an attention output using an attention mechanism that is configured to compute the attention output based on a series of encodings generated by an encoder comprising one or more neural network layers; generating, by the key phrase spotting system and using attention output, output that indicates whether the audio signal likely encodes the key phrase; and providing, by the key phrase spotting system, the output that indicates whether the audio signal likely encodes the key phrase.

12.

发明授权
Language agnostic multilingual end-to-end streaming on-device ASR system 有权

公开(公告)号：US12183322B2

公开(公告)日：2024-12-31

申请号：US17934555

申请日：2022-09-22

Applicant: Google LLC

Inventor： Bo Li , Tara N. Sainath , Ruoming Pang , Shuo-yiin Chang , Qiumin Xu , Trevor Strohman , Vince Chen , Qiao Liang , Heguang Liu , Yanzhang He , Parisa Haghani , Sameer Bidichandani

IPC: G10L15/00 , G10L15/06 , G10L15/22 , G10L15/30

Abstract: A method includes receiving a sequence of acoustic frames characterizing one or more utterances as input to a multilingual automated speech recognition (ASR) model. The method also includes generating a higher order feature representation for a corresponding acoustic frame. The method also includes generating a hidden representation based on a sequence of non-blank symbols output by a final softmax layer. The method also includes generating a probability distribution over possible speech recognition hypotheses based on the hidden representation generated by the prediction network at each of the plurality of output steps and the higher order feature representation generated by the encoder at each of the plurality of output steps. The method also includes predicting an end of utterance (EOU) token at an end of each utterance. The method also includes classifying each acoustic frame as either speech, initial silence, intermediate silence, or final silence.

13.

发明申请
TWO-PASS END TO END SPEECH RECOGNITION 有权

公开(公告)号：US20240420687A1

公开(公告)日：2024-12-19

申请号：US18815537

申请日：2024-08-26

Applicant: GOOGLE LLC

Inventor： Tara N. Sainath , Yanzhang He , Bo Li , Arun Narayanan , Ruoming Pang , Antoine Jean Bruguier , Shuo-yiin Chang , Wei Li

IPC: G10L15/16 , G06N3/08 , G10L15/05 , G10L15/06 , G10L15/22

Abstract: Two-pass automatic speech recognition (ASR) models can be used to perform streaming on-device ASR to generate a text representation of an utterance captured in audio data. Various implementations include a first-pass portion of the ASR model used to generate streaming candidate recognition(s) of an utterance captured in audio data. For example, the first-pass portion can include a recurrent neural network transformer (RNN-T) decoder. Various implementations include a second-pass portion of the ASR model used to revise the streaming candidate recognition(s) of the utterance and generate a text representation of the utterance. For example, the second-pass portion can include a listen attend spell (LAS) decoder. Various implementations include a shared encoder shared between the RNN-T decoder and the LAS decoder.

14.

发明授权
Joint acoustic echo cancelation, speech enhancement, and voice separation for automatic speech recognition 有权

公开(公告)号：US12119014B2

公开(公告)日：2024-10-15

申请号：US17644108

申请日：2021-12-14

Applicant: Google LLC

Inventor： Arun Narayanan , Tom O'malley , Quan Wang , Alex Park , James Walker , Nathan David Howard , Yanzhang He , Chung-Cheng Chiu

IPC: G10L21/0216 , G06N3/04 , G10L15/06 , G10L21/0208 , H04R3/04

CPC classification number: G10L21/0216 , G06N3/04 , G10L15/063 , H04R3/04 , G10L2021/02082

Abstract: A method for automatic speech recognition using joint acoustic echo cancellation, speech enhancement, and voice separation includes receiving, at a contextual frontend processing model, input speech features corresponding to a target utterance. The method also includes receiving, at the contextual frontend processing model, at least one of a reference audio signal, a contextual noise signal including noise prior to the target utterance, or a speaker embedding including voice characteristics of a target speaker that spoke the target utterance. The method further includes processing, using the contextual frontend processing model, the input speech features and the at least one of the reference audio signal, the contextual noise signal, or the speaker embedding vector to generate enhanced speech features.

15.

发明公开
Multi-Output Decoders for Multi-Task Learning of ASR and Auxiliary Tasks 审中-公开

公开(公告)号：US20240153495A1

公开(公告)日：2024-05-09

申请号：US18494984

申请日：2023-10-26

Applicant: Google LLC

Inventor： Weiran Wang , Ding Zhao , Shaojin Ding , Hao Zhang , Shuo-yiin Chang , David Johannes Rybach , Tara N. Sainath , Yanzhang He , Ian McGraw , Shankar Kumar

IPC: G10L15/06 , G06F40/284 , G10L15/26

CPC classification number: G10L15/063 , G06F40/284 , G10L15/26

Abstract: A method includes receiving a training dataset that includes one or more spoken training utterances for training an automatic speech recognition (ASR) model. Each spoken training utterance in the training dataset paired with a corresponding transcription and a corresponding target sequence of auxiliary tokens. For each spoken training utterance, the method includes generating a speech recognition hypothesis for a corresponding spoken training utterance, determining a speech recognition loss based on the speech recognition hypothesis and the corresponding transcription, generating a predicted auxiliary token for the corresponding spoken training utterance, and determining an auxiliary task loss based on the predicted auxiliary token and the corresponding target sequence of auxiliary tokens. The method also includes the ASR model jointly on the speech recognition loss and the auxiliary task loss determined for each spoken training utterance.

16.

发明授权
Efficient streaming non-recurrent on-device end-to-end model 有权

公开(公告)号：US11715458B2

公开(公告)日：2023-08-01

申请号：US17316198

申请日：2021-05-10

Applicant: Google LLC

Inventor： Tara Sainath , Arun Narayanan , Rami Botros , Yanzhang He , Ehsan Variani , Cyril Allauzen , David Rybach , Ruoming Pang , Trevor Strohman

IPC: G10L15/00 , G10L15/06 , G10L15/02 , G10L15/22 , G10L15/30

CPC classification number: G10L15/063 , G10L15/02 , G10L15/22 , G10L15/30

Abstract: An ASR model includes a first encoder configured to receive a sequence of acoustic frames and generate a first higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The ASR model also includes a second encoder configured to receive the first higher order feature representation generated by the first encoder at each of the plurality of output steps and generate a second higher order feature representation for a corresponding first higher order feature frame. The ASR model also includes a decoder configured to receive the second higher order feature representation generated by the second encoder at each of the plurality of output steps and generate a first probability distribution over possible speech recognition hypothesis. The ASR model also includes a language model configured to receive the first probability distribution over possible speech hypothesis and generate a rescored probability distribution.

17.

发明授权
Attention-based joint acoustic and text on-device end-to-end model 有权

公开(公告)号：US11594212B2

公开(公告)日：2023-02-28

申请号：US17155010

申请日：2021-01-21

Applicant: Google LLC

Inventor： Tara N. Sainath , Ruoming Pang , Ron Weiss , Yanzhang He , Chung-Cheng Chiu , Trevor Strohman

IPC: G10L15/06 , G06N3/08 , G10L15/16 , G10L15/197

Abstract: A method includes receiving a training example for a listen-attend-spell (LAS) decoder of a two-pass streaming neural network model and determining whether the training example corresponds to a supervised audio-text pair or an unpaired text sequence. When the training example corresponds to an unpaired text sequence, the method also includes determining a cross entropy loss based on a log probability associated with a context vector of the training example. The method also includes updating the LAS decoder and the context vector based on the determined cross entropy loss.

18.

发明授权
Voice shortcut detection with speaker verification 有权

公开(公告)号：US11568878B2

公开(公告)日：2023-01-31

申请号：US17233253

申请日：2021-04-16

Applicant: Google LLC

Inventor： Rajeev Rikhye , Quan Wang , Yanzhang He , Qiao Liang , Ian C. McGraw

IPC: G10L17/24 , G10L17/06 , G10L21/028

Abstract: Techniques disclosed herein are directed towards streaming keyphrase detection which can be customized to detect one or more particular keyphrases, without requiring retraining of any model(s) for those particular keyphrase(s). Many implementations include processing audio data using a speaker separation model to generate separated audio data which isolates an utterance spoken by a human speaker from one or more additional sounds not spoken by the human speaker, and processing the separated audio data using a text independent speaker identification model to determine whether a verified and/or registered user spoke a spoken utterance captured in the audio data. Various implementations include processing the audio data and/or the separated audio data using an automatic speech recognition model to generate a text representation of the utterance. Additionally or alternatively, the text representation of the utterance can be processed to determine whether at least a portion of the text representation of the utterance captures a particular keyphrase. When the system determines the registered and/or verified user spoke the utterance and the system determines the text representation of the utterance captures the particular keyphrase, the system can cause a computing device to perform one or more actions corresponding to the particular keyphrase.

19.

发明公开
Robustness Aware Norm Decay for Quantization Aware Training and Generalization 审中-公开

公开(公告)号：US20240347043A1

公开(公告)日：2024-10-17

申请号：US18632237

申请日：2024-04-10

Applicant: Google LLC

Inventor： David Qiu , David Rim , Shaojin Ding , Yanzhang He

IPC: G10L15/06

CPC classification number: G10L15/063

Abstract: A method includes obtaining a plurality of training samples, determining a minimum integer fixed-bit width representing a maximum quantization of an automatic speech recognition (ASR) model, and training the ASR model on the plurality of training samples using a quantity of random noise. The ASR model includes a plurality of weights that each include a respective float value. The quantity of random noise is based on the minimum integer fixed-bit value. After training the ASR model, the method also includes selecting a target integer fixed-bit width greater than or equal to the minimum integer fixed-bit width, and for each respective weight of the plurality of weights, quantizing the respective weight from the respective float value to a respective integer associated with a value of the selected target integer fixed-bit width. The operations also include providing the quantized trained ASR model to a user device.

20.

发明公开
End-To-End Segmentation in a Two-Pass Cascaded Encoder Automatic Speech Recognition Model 审中-公开

公开(公告)号：US20240169981A1

公开(公告)日：2024-05-23

申请号：US18512110

申请日：2023-11-17

Applicant: Google LLC

Inventor： Wenqian Ronny Huang , Shuo-yiin Chang , Tara N. Sainath , Yanzhang He

IPC: G10L15/197 , G10L15/02 , G10L15/05 , G10L15/06 , G10L15/16

CPC classification number: G10L15/197 , G10L15/02 , G10L15/05 , G10L15/063 , G10L15/16 , G10L2015/025 , G10L15/22

Abstract: A unified end-to-end segmenter and two-pass automatic speech recognition (ASR) model includes a first encoder, a first decoder, a second encoder, and a second decoder. The first encoder is configured to receive a sequence of acoustic frames and generate a first higher order feature representation. The first decoder is configured to receive the first higher order feature representation and generate, at each of a plurality of output steps, a first probability distribution and an indication of whether the output step corresponds to an end of speech segment, and emit an end of speech timestamp. The second encoder is configured to receive the first higher order feature representation and the end of speech timestamp, and generate a second higher order feature representation. The second decoder is configured to receive the second higher order feature representation and generate a second probability distribution.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification