-
公开(公告)号:US11996103B2
公开(公告)日:2024-05-28
申请号:US17811605
申请日:2022-07-11
Applicant: Google LLC
Inventor: Petar Aleksic , Pedro J. Moreno Mengibar
IPC: G10L15/00 , G06F16/632 , G10L15/04 , G10L15/19 , G10L15/197 , G10L15/22 , G10L15/26 , G10L15/08 , G10L15/183
CPC classification number: G10L15/26 , G06F16/632 , G10L15/04 , G10L15/19 , G10L15/197 , G10L2015/085 , G10L15/183 , G10L15/22
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for voice recognition. In one aspect, a method includes the actions of receiving a voice input; determining a transcription for the voice input, wherein determining the transcription for the voice input includes, for a plurality of segments of the voice input: obtaining a first candidate transcription for a first segment of the voice input; determining one or more contexts associated with the first candidate transcription; adjusting a respective weight for each of the one or more contexts; and determining a second candidate transcription for a second segment of the voice input based in part on the adjusted weights; and providing the transcription of the plurality of segments of the voice input for output.
-
公开(公告)号:US11942091B2
公开(公告)日:2024-03-26
申请号:US17251465
申请日:2020-01-17
Applicant: Google LLC
Inventor: Benjamin Haynor , Petar Aleksic
IPC: G10L15/26 , G10L15/16 , G10L15/193 , G10L15/22 , G10L15/30
CPC classification number: G10L15/26 , G10L15/16 , G10L15/193 , G10L15/22 , G10L15/30
Abstract: Speech processing techniques are disclosed that enable determining a text representation of alphanumeric sequences in captured audio data. Various implementations include determining a contextual biasing finite state transducer (FST) based on contextual information corresponding to the captured audio data. Additional or alternative implementations include modifying probabilities of one or more candidate recognitions of the alphanumeric sequence using the contextual biasing FST, where the FST further comprises a grammar as well as a speller finite state transducer.
-
公开(公告)号:US11810568B2
公开(公告)日:2023-11-07
申请号:US17118232
申请日:2020-12-10
Applicant: Google LLC
Inventor: Petar Aleksic , Pedro J. Moreno Mengibar
IPC: G10L15/26 , G10L15/18 , G10L15/183 , G10L15/07 , G10L15/197 , G10L15/30 , G10L15/22 , G10L15/06 , G10L15/08
CPC classification number: G10L15/26 , G10L15/07 , G10L15/183 , G10L15/1815 , G10L15/197 , G10L15/30 , G10L2015/0635 , G10L2015/088 , G10L2015/228
Abstract: A computer-implemented method for transcribing an utterance includes receiving, at a computing system, speech data that characterizes an utterance of a user. A first set of candidate transcriptions of the utterance can be generated using a static class-based language model that includes a plurality of classes that are each populated with class-based terms selected independently of the utterance or the user. The computing system can then determine whether the first set of candidate transcriptions includes class-based terms. Based on whether the first set of candidate transcriptions includes class-based terms, the computing system can determine whether to generate a dynamic class-based language model that includes at least one class that is populated with class-based terms selected based on a context associated with at least one of the utterance and the user.
-
公开(公告)号:US11797763B2
公开(公告)日:2023-10-24
申请号:US17443330
申请日:2021-07-24
Applicant: Google LLC
Inventor: Evgeny A. Cherepanov , Gleb Skobeltsyn , Jakob Nicolaus Foerster , Petar Aleksic , Assaf Avner Hurwitz Michaely
IPC: G10L15/22 , G10L15/01 , G10L15/24 , G06F40/232 , G10L15/32 , G10L15/26 , G10L15/197 , G10L15/187 , G06F3/16 , G10L15/19 , G10L15/30 , G10L15/08
CPC classification number: G06F40/232 , G06F3/167 , G10L15/187 , G10L15/19 , G10L15/197 , G10L15/22 , G10L15/26 , G10L15/30 , G10L15/32 , G10L2015/086 , G10L2015/223
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for natural language processing. One of the methods includes receiving a first voice input from a user device; generating a first recognition output; receiving a user selection of one or more terms in the first recognition output; receiving a second voice input spelling a correction of the user selection; determining a corrected recognition output for the selected portion; and providing a second recognition output that merges the first recognition output and the corrected recognition output.
-
公开(公告)号:US11682383B2
公开(公告)日:2023-06-20
申请号:US17337400
申请日:2021-06-02
Applicant: Google LLC
Inventor: Petar Aleksic , Pedro J. Moreno Mengibar
IPC: G10L15/187 , G10L15/07 , G10L15/18 , G10L15/197 , G10L15/30 , G10L15/01
CPC classification number: G10L15/07 , G10L15/187 , G10L15/1815 , G10L15/197 , G10L15/01 , G10L15/30
Abstract: Methods, systems, and apparatus for receiving audio data corresponding to a user utterance and context data, identifying an initial set of one or more n-grams from the context data, generating an expanded set of one or more n-grams based on the initial set of n-grams, adjusting a language model based at least on the expanded set of n-grams, determining one or more speech recognition candidates for at least a portion of the user utterance using the adjusted language model, adjusting a score for a particular speech recognition candidate determined to be included in the expanded set of n-grams, determining a transcription of user utterance that includes at least one of the one or more speech recognition candidates, and providing the transcription of the user utterance for output.
-
公开(公告)号:US20220383862A1
公开(公告)日:2022-12-01
申请号:US17817176
申请日:2022-08-03
Applicant: Google LLC
Inventor: Petar Aleksic , Pedro J Moreno Mengibar
IPC: G10L15/187 , G10L15/02 , G10L15/22
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for cross-lingual speech recognition are disclosed. In one aspect, a method includes the actions of determining a context of a second computing device. The actions further include identifying, by a first computing device, an additional pronunciation for a term of multiple terms. The actions further include including the additional pronunciation for the term in the lexicon. The actions further include receiving audio data of an utterance. The actions further include generating a transcription of the utterance by using the lexicon that includes the multiple terms and the pronunciation for each of the multiple terms and the additional pronunciation for the term. The actions further include after generating the transcription of the utterance, removing the additional pronunciation for the term from the lexicon. The actions further include providing, for output, the transcription.
-
公开(公告)号:US11437025B2
公开(公告)日:2022-09-06
申请号:US16593564
申请日:2019-10-04
Applicant: Google LLC
Inventor: Petar Aleksic , Pedro J. Moreno Mengibar
IPC: G10L15/187 , G10L15/02 , G10L15/22
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for cross-lingual speech recognition are disclosed. In one aspect, a method includes the actions of determining a context of a second computing device. The actions further include identifying, by a first computing device, an additional pronunciation for a term of multiple terms. The actions further include including the additional pronunciation for the term in the lexicon. The actions further include receiving audio data of an utterance. The actions further include generating a transcription of the utterance by using the lexicon that includes the multiple terms and the pronunciation for each of the multiple terms and the additional pronunciation for the term. The actions further include after generating the transcription of the utterance, removing the additional pronunciation for the term from the lexicon. The actions further include providing, for output, the transcription.
-
公开(公告)号:US20220229992A1
公开(公告)日:2022-07-21
申请号:US17589186
申请日:2022-01-31
Applicant: GOOGLE LLC
Inventor: Leonid Velikovich , Petar Aleksic , Pedro Moreno
IPC: G06F40/295 , G06F40/30 , G10L15/06 , G10L15/187 , G10L15/22
Abstract: Speech processing techniques are disclosed that enable determining a text representation of named entities in captured audio data. Various implementations include determining the location of a carrier phrase in a word lattice representation of the captured audio data, where the carrier phrase provides an indication of a named entity. Additional or alternative implementations include matching a candidate named entity with the portion of the word lattice, and augmenting the word lattice with the matched candidate named entity.
-
公开(公告)号:US11341972B2
公开(公告)日:2022-05-24
申请号:US17078030
申请日:2020-10-22
Applicant: Google LLC
Inventor: Alexander H. Gruenstein , Petar Aleksic
IPC: G10L15/26 , G10L15/30 , G10L15/18 , G10L15/22 , G10L15/32 , G10L15/193 , G10L15/197
Abstract: In one aspect, a method comprises accessing audio data generated by a computing device based on audio input from a user, the audio data encoding one or more user utterances. The method further comprises generating a first transcription of the utterances by performing speech recognition on the audio data using a first speech recognizer that employs a language model based on user-specific data. The method further comprises generating a second transcription of the utterances by performing speech recognition on the audio data using a second speech recognizer that employs a language model independent of user-specific data. The method further comprises determining that the second transcription of the utterances includes a term from a predefined set of one or more terms. The method further comprises, based on determining that the second transcription of the utterance includes the term, providing an output of the first transcription of the utterance.
-
公开(公告)号:US11282525B2
公开(公告)日:2022-03-22
申请号:US17009494
申请日:2020-09-01
Applicant: Google LLC
Inventor: Assaf Hurwitz Michaely , Petar Aleksic , Pedro J. Moreno Mengibar
Abstract: A method includes receiving a speech input from a user and obtaining context metadata associated with the speech input. The method also includes generating a raw speech recognition result corresponding to the speech input and selecting a list of one or more denormalizers to apply to the generated raw speech recognition result based on the context metadata associated with the speech input. The generated raw speech recognition result includes normalized text. The method also includes denormalizing the generated raw speech recognition result into denormalized text by applying the list of the one or more denormalizers in sequence to the generated raw speech recognition result.
-
-
-
-
-
-
-
-
-