Patent search ap:("Google LLC") AND inv:"Ke Hu" Page 1

1.

发明公开
MIXTURE-OF-EXPERT CONFORMER FOR STREAMING MULTILINGUAL ASR 审中-公开

公开(公告)号：US20240304185A1

公开(公告)日：2024-09-12

申请号：US18598885

申请日：2024-03-07

Applicant: Google LLC

Inventor： Ke Hu , Bo Li , Tara N. Sainath , Yu Zhang , Francoise Beaufays

IPC: G10L15/197 , G10L15/02 , G10L15/06

CPC classification number: G10L15/197 , G10L15/02 , G10L15/063

Abstract: A method of a multilingual ASR model includes receiving a sequence of acoustic frames characterizing an utterance of speech. At a plurality of output steps, the method further includes generating a first higher order feature representation for an acoustic frame by a first encoder that includes a first plurality of multi-head attention layers; generating a second higher order feature representation for a corresponding first higher order feature representation by a second encoder that includes a second plurality of multi-head attention layers; and generating, by a first decoder, a first probability distribution over possible speech recognition hypotheses based on the second higher order feature representation and a sequence of N previous non-blank symbols. A gating layer of each respective MoE layer configured to dynamically route an output from a previous multi-head attention layer at each of the plurality of output steps to a respective pair of feed-forward expert networks.

2.

发明授权
Phoneme-based contextualization for cross-lingual speech recognition in end-to-end models 有权

公开(公告)号：US11942076B2

公开(公告)日：2024-03-26

申请号：US17651315

申请日：2022-02-16

Applicant: Google LLC

Inventor： Ke Hu , Golan Pundak , Rohit Prakash Prabhavalkar , Antoine Jean Bruguier , Tara N. Sainath

IPC: G10L15/30 , G10L15/02 , G10L15/06 , G10L15/187 , G10L15/193 , G10L15/28 , G10L15/32 , G10L25/30

CPC classification number: G10L15/063 , G10L15/02 , G10L15/187 , G10L15/193 , G10L15/285 , G10L15/32 , G10L25/30 , G10L2015/025

Abstract: A method includes receiving audio data encoding an utterance spoken by a native speaker of a first language, and receiving a biasing term list including one or more terms in a second language different than the first language. The method also includes processing, using a speech recognition model, acoustic features derived from the audio data to generate speech recognition scores for both wordpieces and corresponding phoneme sequences in the first language. The method also includes rescoring the speech recognition scores for the phoneme sequences based on the one or more terms in the biasing term list, and executing, using the speech recognition scores for the wordpieces and the rescored speech recognition scores for the phoneme sequences, a decoding graph to generate a transcription for the utterance.

3.

发明公开
Deliberation by Text-Only and Semi-Supervised Training 审中-公开

公开(公告)号：US20230298563A1

公开(公告)日：2023-09-21

申请号：US18186157

申请日：2023-03-18

Applicant: Google LLC

Inventor： Ke Hu , Tara N. Sainath , Yanzhang He , Rohit Prabhavalkar , Sepand Mavandadi , Weiran Wang , Trevor Strohman

IPC: G10L13/08 , G10L15/16 , G10L15/06

CPC classification number: G10L13/08 , G10L15/16 , G10L15/063

Abstract: A method of text-only and semi-supervised training for deliberation includes receiving training data including unspoken textual utterances that are each not paired with any corresponding spoken utterance of non-synthetic speech, and training a deliberation model that includes a text encoder and a deliberation decoder on the unspoken textual utterances. The method also includes receiving, at the trained deliberation model, first-pass hypotheses and non-causal acoustic embeddings. The first-pass hypotheses is generated by a recurrent neural network-transducer (RNN-T) decoder for the non-causal acoustic embeddings encoded by a non-causal encoder. The method also includes encoding, using the text encoder, the first-pass hypotheses generated by the RNN-T decoder, and generating, using the deliberation decoder attending to both the first-pass hypotheses and the non-causal acoustic embeddings, second-pass hypotheses.

4.

发明申请
Learning Word-Level Confidence for Subword End-To-End Automatic Speech Recognition 有权

公开(公告)号：US20220270597A1

公开(公告)日：2022-08-25

申请号：US17182592

申请日：2021-02-23

Applicant: Google LLC

Inventor： David Qiu , Qiujia Li , Yanzhang He , Yu Zhang , Bo Li , Liangliang Cao , Rohit Prabhavalkar , Deepti Bhatia , Wei Li , Ke Hu , Tara Sainath , Ian Mcgraw

IPC: G10L15/22 , G10L15/08 , G10L25/30 , G06N3/08

Abstract: A method includes receiving a speech recognition result, and using a confidence estimation module (CEM), for each sub-word unit in a sequence of hypothesized sub-word units for the speech recognition result: obtaining a respective confidence embedding that represents a set of confidence features; generating, using a first attention mechanism, a confidence feature vector; generating, using a second attention mechanism, an acoustic context vector; and generating, as output from an output layer of the CEM, a respective confidence output score for each corresponding sub-word unit based on the confidence feature vector and the acoustic feature vector received as input by the output layer of the CEM. For each of the one or more words formed by the sequence of hypothesized sub-word units, the method also includes determining a respective word-level confidence score for the word. The method also includes determining an utterance-level confidence score by aggregating the word-level confidence scores.

5.

发明授权
Phoneme-based contextualization for cross-lingual speech recognition in end-to-end models 有权

公开(公告)号：US11270687B2

公开(公告)日：2022-03-08

申请号：US16861190

申请日：2020-04-28

Applicant: Google LLC

Inventor： Ke Hu , Antoine Jean Bruguier , Tara N. Sainath , Rohit Prakash Prabhavalkar , Golan Pundak

IPC: G10L15/30 , G10L15/06 , G10L15/02 , G10L15/187 , G10L15/193 , G10L15/28 , G10L15/32 , G10L25/30

Abstract: A method includes receiving audio data encoding an utterance spoken by a native speaker of a first language, and receiving a biasing term list including one or more terms in a second language different than the first language. The method also includes processing, using a speech recognition model, acoustic features derived from the audio data to generate speech recognition scores for both wordpieces and corresponding phoneme sequences in the first language. The method also includes rescoring the speech recognition scores for the phoneme sequences based on the one or more terms in the biasing term list, and executing, using the speech recognition scores for the wordpieces and the rescored speech recognition scores for the phoneme sequences, a decoding graph to generate a transcription for the utterance.

6.

发明申请
Transducer-Based Streaming Deliberation for Cascaded Encoders 有权

公开(公告)号：US20240428786A1

公开(公告)日：2024-12-26

申请号：US18826655

申请日：2024-09-06

Applicant: Google LLC

Inventor： Ke Hu , Tara N. Sainath , Arun Narayanan , Ruoming Pang , Trevor Strohman

IPC: G10L15/197 , G06F40/126 , G10L15/02 , G10L15/06 , G10L15/08 , G10L15/22

Abstract: A method includes receiving a sequence of acoustic frames and generating, by a first encoder, a first higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The method also includes generating, by a first pass transducer decoder, a first pass speech recognition hypothesis for a corresponding first higher order feature representation and generating, by a text encoder, a text encoding for a corresponding first pass speech recognition hypothesis. The method also includes generating, by a second encoder, a second higher order feature representation for a corresponding first higher order feature representation. The method also includes generating, by a second pass transducer decoder, a second pass speech recognition hypothesis using a corresponding second higher order feature representation and a corresponding text encoding.

7.

发明授权
Learning word-level confidence for subword end-to-end automatic speech recognition 有权

公开(公告)号：US11610586B2

公开(公告)日：2023-03-21

申请号：US17182592

申请日：2021-02-23

Applicant: Google LLC

Inventor： David Qiu , Qiujia Li , Yanzhang He , Yu Zhang , Bo Li , Liangliang Cao , Rohit Prabhavalkar , Deepti Bhatia , Wei Li , Ke Hu , Tara Sainath , Ian Mcgraw

IPC: G10L15/22 , G10L15/08 , G06N3/08 , G10L25/30

Abstract: A method includes receiving a speech recognition result, and using a confidence estimation module (CEM), for each sub-word unit in a sequence of hypothesized sub-word units for the speech recognition result: obtaining a respective confidence embedding that represents a set of confidence features; generating, using a first attention mechanism, a confidence feature vector; generating, using a second attention mechanism, an acoustic context vector; and generating, as output from an output layer of the CEM, a respective confidence output score for each corresponding sub-word unit based on the confidence feature vector and the acoustic feature vector received as input by the output layer of the CEM. For each of the one or more words formed by the sequence of hypothesized sub-word units, the method also includes determining a respective word-level confidence score for the word. The method also includes determining an utterance-level confidence score by aggregating the word-level confidence scores.

8.

发明申请
PHONEME-BASED CONTEXTUALIZATION FOR CROSS-LINGUAL SPEECH RECOGNITION IN END-TO-END MODELS 审中-公开

公开(公告)号：US20200349923A1

公开(公告)日：2020-11-05

申请号：US16861190

申请日：2020-04-28

Applicant: Google LLC

Inventor： Ke Hu , Antoine Jean Bruguier , Tara N. Sainath , Rohit Prakash Prabhavalkar , Golan Pundak

IPC: G10L15/06 , G10L15/187 , G10L15/193 , G10L15/32 , G10L15/28 , G10L25/30 , G10L15/02

Abstract: A method includes receiving audio data encoding an utterance spoken by a native speaker of a first language, and receiving a biasing term list including one or more terms in a second language different than the first language. The method also includes processing, using a speech recognition model, acoustic features derived from the audio data to generate speech recognition scores for both wordpieces and corresponding phoneme sequences in the first language. The method also includes rescoring the speech recognition scores for the phoneme sequences based on the one or more terms in the biasing term list, and executing, using the speech recognition scores for the wordpieces and the rescored speech recognition scores for the phoneme sequences, a decoding graph to generate a transcription for the utterance.

9.

发明授权
Transducer-based streaming deliberation for cascaded encoders 有权

公开(公告)号：US12118988B2

公开(公告)日：2024-10-15

申请号：US17933307

申请日：2022-09-19

Applicant: Google LLC

Inventor： Ke Hu , Tara N. Sainath , Arun Narayanan , Ruoming Pang , Trevor Strohman

IPC: G10L15/197 , G06F40/126 , G10L15/02 , G10L15/06 , G10L15/08 , G10L15/22

CPC classification number: G10L15/197 , G06F40/126 , G10L15/02 , G10L15/063 , G10L15/083 , G10L15/22

Abstract: A method includes receiving a sequence of acoustic frames and generating, by a first encoder, a first higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The method also includes generating, by a first pass transducer decoder, a first pass speech recognition hypothesis for a corresponding first higher order feature representation and generating, by a text encoder, a text encoding for a corresponding first pass speech recognition hypothesis. The method also includes generating, by a second encoder, a second higher order feature representation for a corresponding first higher order feature representation. The method also includes generating, by a second pass transducer decoder, a second pass speech recognition hypothesis using a corresponding second higher order feature representation and a corresponding text encoding.

10.

发明授权
Deliberation model-based two-pass end-to-end speech recognition 有权

公开(公告)号：US12027158B2

公开(公告)日：2024-07-02

申请号：US18164923

申请日：2023-02-06

Applicant: Google LLC

Inventor： Ke Hu , Tara N. Sainath , Ruoming Pang , Rohit Prakash Prabhavalkar

IPC: G10L15/18 , G06N3/049 , G10L15/06 , G10L15/16 , G10L15/187 , G10L19/00

CPC classification number: G10L15/1815 , G06N3/049 , G10L15/063 , G10L15/16 , G10L15/187 , G10L19/0018

Abstract: A method of performing speech recognition using a two-pass deliberation architecture includes receiving a first-pass hypothesis and an encoded acoustic frame and encoding the first-pass hypothesis at a hypothesis encoder. The first-pass hypothesis is generated by a recurrent neural network (RNN) decoder model for the encoded acoustic frame. The method also includes generating, using a first attention mechanism attending to the encoded acoustic frame, a first context vector, and generating, using a second attention mechanism attending to the encoded first-pass hypothesis, a second context vector. The method also includes decoding the first context vector and the second context vector at a context vector decoder to form a second-pass hypothesis.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification