-
公开(公告)号:US20240144917A1
公开(公告)日:2024-05-02
申请号:US18494763
申请日:2023-10-25
Applicant: Google LLC
Inventor: Rami Magdi Fahmi Botros , Rohit Prakash Prabhavalkar , Johan Schalkwyk , Tara N. Sainath , Ciprian Ioan Chelba , Francoise Beaufays
IPC: G10L15/16
CPC classification number: G10L15/16
Abstract: A method includes obtaining a base encoder from a pre-trained model, and receiving training data comprising a sequence of acoustic frames characterizing an utterance paired with a ground-truth transcription of the utterance. At each of a plurality of output steps, the method includes: generating, by the base encoder, a first encoded representation for a corresponding acoustic frame; generating, by an exporter network configured to receive a continuous sequence of first encoded representations generated by the base encoder, a second encoded representation for a corresponding acoustic frame; generating, by an exporter decoder, a probability distribution over possible logits; and determining an exporter decoder loss based on the probability distribution over possible logits generated by the exporter decoder at the corresponding output step and the ground-truth transcription. The method also includes training the exporter network based on the exporter decoder losses while parameters of the base encoder are frozen.
-
公开(公告)号:US20230186907A1
公开(公告)日:2023-06-15
申请号:US18164923
申请日:2023-02-06
Applicant: Google LLC
Inventor: Ke Hu , Tara N. Sainath , Ruoming Pang , Rohit Prakash Prabhavalkar
CPC classification number: G10L15/1815 , G06N3/049 , G10L15/063 , G10L15/16 , G10L15/187 , G10L19/0018
Abstract: A method of performing speech recognition using a two-pass deliberation architecture includes receiving a first-pass hypothesis and an encoded acoustic frame and encoding the first-pass hypothesis at a hypothesis encoder. The first-pass hypothesis is generated by a recurrent neural network (RNN) decoder model for the encoded acoustic frame. The method also includes generating, using a first attention mechanism attending to the encoded acoustic frame, a first context vector, and generating, using a second attention mechanism attending to the encoded first-pass hypothesis, a second context vector. The method also includes decoding the first context vector and the second context vector at a context vector decoder to form a second-pass hypothesis
-
公开(公告)号:US11475880B2
公开(公告)日:2022-10-18
申请号:US16809403
申请日:2020-03-04
Applicant: Google LLC
Inventor: Shuo-yiin Chang , Rohit Prakash Prabhavalkar , Gabor Simko , Tara N. Sainath , Bo Li , Yangzhang He
Abstract: A method includes receiving audio data of an utterance and processing the audio data to obtain, as output from a speech recognition model configured to jointly perform speech decoding and endpointing of utterances: partial speech recognition results for the utterance; and an endpoint indication indicating when the utterance has ended. While processing the audio data, the method also includes detecting, based on the endpoint indication, the end of the utterance. In response to detecting the end of the utterance, the method also includes terminating the processing of any subsequent audio data received after the end of the utterance was detected.
-
公开(公告)号:US11423883B2
公开(公告)日:2022-08-23
申请号:US16836445
申请日:2020-03-31
Applicant: Google LLC
Inventor: Rohit Prakash Prabhavalkar , Golan Pundak , Tara N. Sainath
Abstract: A method includes receiving audio data encoding an utterance and obtaining a set of bias phrases corresponding to a context of the utterance. Each bias phrase includes one or more words. The method also includes processing, using a speech recognition model, acoustic features derived from the audio to generate an output from the speech recognition model. The speech recognition model includes a first encoder configured to receive the acoustic features, a first attention module, a bias encoder configured to receive data indicating the obtained set of bias phrases, a bias encoder, and a decoder configured to determine likelihoods of sequences of speech elements based on output of the first attention module and output of the bias attention module. The method also includes determining a transcript for the utterance based on the likelihoods of sequences of speech elements.
-
公开(公告)号:US11295739B2
公开(公告)日:2022-04-05
申请号:US16527487
申请日:2019-07-31
Applicant: Google LLC
Inventor: Wei Li , Rohit Prakash Prabhavalkar , Kanury Kanishka Rao , Yanzhang He , Ian C. McGraw , Anton Bakhtin
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for detecting utterances of a key phrase in an audio signal. One of the methods includes receiving, by a key phrase spotting system, an audio signal encoding one or more utterances; while continuing to receive the audio signal, generating, by the key phrase spotting system, an attention output using an attention mechanism that is configured to compute the attention output based on a series of encodings generated by an encoder comprising one or more neural network layers; generating, by the key phrase spotting system and using attention output, output that indicates whether the audio signal likely encodes the key phrase; and providing, by the key phrase spotting system, the output that indicates whether the audio signal likely encodes the key phrase.
-
公开(公告)号:US11217231B2
公开(公告)日:2022-01-04
申请号:US16863766
申请日:2020-04-30
Applicant: Google LLC
Inventor: Rohit Prakash Prabhavalkar , Golan Pundak , Tara N. Sainath , Antoine Jean Bruguier
IPC: G10L15/187 , G06N20/10 , G10L19/04 , G10L15/08
Abstract: A method of biasing speech recognition includes receiving audio data encoding an utterance and obtaining a set of one or more biasing phrases corresponding to a context of the utterance. Each biasing phrase in the set of one or more biasing phrases includes one or more words. The method also includes processing, using a speech recognition model, acoustic features derived from the audio data and grapheme and phoneme data derived from the set of one or more biasing phrases to generate an output of the speech recognition model. The method also includes determining a transcription for the utterance based on the output of the speech recognition model.
-
公开(公告)号:US20210358491A1
公开(公告)日:2021-11-18
申请号:US17443557
申请日:2021-07-27
Applicant: Google LLC
Inventor: Rohit Prakash Prabhavalkar , Tara N. Sainath , Yonghui Wu , Patrick An Phu Nguyen , Zhifeng Chen , Chung-Cheng Chiu , Anjuli Patricia Kannan
IPC: G10L15/197 , G10L15/16 , G10L15/06 , G10L15/02 , G10L15/22
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer-readable storage media, for speech recognition using attention-based sequence-to-sequence models. In some implementations, audio data indicating acoustic characteristics of an utterance is received. A sequence of feature vectors indicative of the acoustic characteristics of the utterance is generated. The sequence of feature vectors is processed using a speech recognition model that has been trained using a loss function that uses N-best lists of decoded hypotheses, the speech recognition model including an encoder, an attention module, and a decoder. The encoder and decoder each include one or more recurrent neural network layers. A sequence of output vectors representing distributions over a predetermined set of linguistic units is obtained. A transcription for the utterance is obtained based on the sequence of output vectors. Data indicating the transcription of the utterance is provided.
-
公开(公告)号:US11107463B2
公开(公告)日:2021-08-31
申请号:US16529252
申请日:2019-08-01
Applicant: Google LLC
Inventor: Rohit Prakash Prabhavalkar , Tara N. Sainath , Yonghui Wu , Patrick An Phu Nguyen , Zhifeng Chen , Chung-Cheng Chiu , Anjuli Patricia Kannan
IPC: G10L15/197 , G10L15/16 , G10L15/06 , G10L15/02 , G10L15/22
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer-readable storage media, for speech recognition using attention-based sequence-to-sequence models. In some implementations, audio data indicating acoustic characteristics of an utterance is received. A sequence of feature vectors indicative of the acoustic characteristics of the utterance is generated. The sequence of feature vectors is processed using a speech recognition model that has been trained using a loss function that uses N-best lists of decoded hypotheses, the speech recognition model including an encoder, an attention module, and a decoder. The encoder and decoder each include one or more recurrent neural network layers. A sequence of output vectors representing distributions over a predetermined set of linguistic units is obtained. A transcription for the utterance is obtained based on the sequence of output vectors. Data indicating the transcription of the utterance is provided.
-
公开(公告)号:US20200066271A1
公开(公告)日:2020-02-27
申请号:US16527487
申请日:2019-07-31
Applicant: Google LLC
Inventor: Wei Li , Rohit Prakash Prabhavalkar , Kanury Kanishka Rao , Yanzhang He , Ian C. McGraw , Anton Bakhtin
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for detecting utterances of a key phrase in an audio signal. One of the methods includes receiving, by a key phrase spotting system, an audio signal encoding one or more utterances; while continuing to receive the audio signal, generating, by the key phrase spotting system, an attention output using an attention mechanism that is configured to compute the attention output based on a series of encodings generated by an encoder comprising one or more neural network layers; generating, by the key phrase spotting system and using attention output, output that indicates whether the audio signal likely encodes the key phrase; and providing, by the key phrase spotting system, the output that indicates whether the audio signal likely encodes the key phrase.
-
公开(公告)号:US12051407B2
公开(公告)日:2024-07-30
申请号:US17815049
申请日:2022-07-26
Applicant: Google LLC
Inventor: Rohit Prakash Prabhavalkar , Golan Pundak , Tara N. Sainath
Abstract: A method includes receiving audio data encoding an utterance and obtaining a set of bias phrases corresponding to a context of the utterance. Each bias phrase includes one or more words. The method also includes processing, using a speech recognition model, acoustic features derived from the audio to generate an output from the speech recognition model. The speech recognition model includes a first encoder configured to receive the acoustic features, a bias encoder configured to receive data indicating the obtained set of bias phrases, a bias encoder, and a decoder configured to determine likelihoods of sequences of speech elements based on output of the first attention module and output of the bias attention module. The method also includes determining a transcript for the utterance based on the likelihoods of sequences of speech elements.
-
-
-
-
-
-
-
-
-