Patent search ap:("Google LLC") AND inv:"Pedro J. Moreno Mengibar" Page 1

1.

发明授权
Sub-models for neural contextual biasing 有权

公开(公告)号：US12230258B2

公开(公告)日：2025-02-18

申请号：US17659836

申请日：2022-04-19

Applicant: Google LLC

Inventor： Fadi Biadsy , Pedro J. Moreno Mengibar

IPC: G10L15/183 , G06N3/04

Abstract: A method for contextual biasing for speech recognition includes obtaining a base automatic speech recognition (ASR) model trained on non-biased data and a sub-model trained on biased data representative of a particular domain. The method includes receiving a speech recognition request including audio data characterizing an utterance captured in streaming audio. The method further includes determining whether the speech recognition request includes a contextual indicator indicating the particular domain. When the speech recognition request does not include the contextual indicator, the method includes generating, using the base ASR model, a first speech recognition result of the utterance by processing the audio data. When the speech recognition request includes the contextual indicator the method includes biasing, using the sub-model, the base ASR model toward the particular domain and generating, using the biased base ASR model, a second speech recognition result of the utterance by processing the audio data.

2.

发明授权
Language model biasing system 有权

公开(公告)号：US12183328B2

公开(公告)日：2024-12-31

申请号：US18318495

申请日：2023-05-16

Applicant: Google LLC

Inventor： Petar Aleksic , Pedro J. Moreno Mengibar

IPC: G10L15/187 , G10L15/07 , G10L15/18 , G10L15/197 , G10L15/01 , G10L15/30

Abstract: Methods, systems, and apparatus for receiving audio data corresponding to a user utterance and context data, identifying an initial set of one or more n-grams from the context data, generating an expanded set of one or more n-grams based on the initial set of n-grams, adjusting a language model based at least on the expanded set of n-grams, determining one or more speech recognition candidates for at least a portion of the user utterance using the adjusted language model, adjusting a score for a particular speech recognition candidate determined to be included in the expanded set of n-grams, determining a transcription of user utterance that includes at least one of the one or more speech recognition candidates, and providing the transcription of the user utterance for output.

3.

发明授权
Server side hotwording 有权

公开(公告)号：US12094472B2

公开(公告)日：2024-09-17

申请号：US18345077

申请日：2023-06-30

Applicant: GOOGLE LLC

Inventor： Alexander H. Gruenstein , Petar Aleksic , Johan Schalkwyk , Pedro J. Moreno Mengibar

IPC: G10L15/30 , G10L15/26 , G10L15/32 , G10L15/08 , G10L15/183 , G10L15/22

CPC classification number: G10L15/30 , G10L15/26 , G10L15/32 , G10L2015/088 , G10L15/183 , G10L2015/223

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for detecting hotwords using a server. One of the methods includes receiving an audio signal encoding one or more utterances including a first utterance; determining whether at least a portion of the first utterance satisfies a first threshold of being at least a portion of a key phrase; in response to determining that at least the portion of the first utterance satisfies the first threshold of being at least a portion of a key phrase, sending the audio signal to a server system that determines whether the first utterance satisfies a second threshold of being the key phrase, the second threshold being more restrictive than the first threshold; and receiving tagged text data representing the one or more utterances encoded in the audio signal when the server system determines that the first utterance satisfies the second threshold.

4.

发明公开
VOICE RECOGNITION SYSTEM 审中-公开

公开(公告)号：US20240282309A1

公开(公告)日：2024-08-22

申请号：US18650864

申请日：2024-04-30

Applicant: Google LLC

Inventor： Petar Aleksic , Pedro J. Moreno Mengibar

IPC: G10L15/26 , G06F16/632 , G10L15/04 , G10L15/08 , G10L15/183 , G10L15/19 , G10L15/197 , G10L15/22

CPC classification number: G10L15/26 , G06F16/632 , G10L15/04 , G10L15/19 , G10L15/197 , G10L2015/085 , G10L15/183 , G10L15/22

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for voice recognition. In one aspect, a method includes the actions of receiving a voice input; determining a transcription for the voice input, wherein determining the transcription for the voice input includes, for a plurality of segments of the voice input: obtaining a first candidate transcription for a first segment of the voice input; determining one or more contexts associated with the first candidate transcription; adjusting a respective weight for each of the one or more contexts; and determining a second candidate transcription for a second segment of the voice input based in part on the adjusted weights; and providing the transcription of the plurality of segments of the voice input for output.

5.

发明授权
Speech recognition for keywords 有权

公开(公告)号：US12026753B2

公开(公告)日：2024-07-02

申请号：US17308624

申请日：2021-05-05

Applicant: Google LLC

Inventor： Petar Aleksic , Pedro J. Moreno Mengibar

IPC: G06Q30/00 , G06Q30/0251 , G06Q30/0273 , G10L13/00 , G10L15/01 , G10L15/06 , G10L15/18 , G10L15/08 , G10L15/187 , G10L15/26

CPC classification number: G06Q30/0275 , G06Q30/0256 , G10L13/00 , G10L15/01 , G10L15/06 , G10L15/18 , G10L2015/088 , G10L15/187 , G10L15/26

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech recognition are disclosed. In one aspect, a method includes receiving a candidate adword from an advertiser. The method further includes generating a score for the candidate adword based on a likelihood of a speech recognizer generating, based on an utterance of the candidate adword, a transcription that includes a word that is associated with an expected pronunciation of the candidate adword. The method further includes classifying, based at least on the score, the candidate adword as an appropriate adword for use in a bidding process for advertisements that are selected based on a transcription of a speech query or as not an appropriate adword for use in the bidding process for advertisements that are selected based on the transcription of the speech query.

6.

发明公开
MULTI-DIALECT AND MULTILINGUAL SPEECH RECOGNITION 审中-公开

公开(公告)号：US20240161732A1

公开(公告)日：2024-05-16

申请号：US18418246

申请日：2024-01-20

Applicant: Google LLC

Inventor： Zhifeng Chen , Bo Li , Eugene Weinstein , Yonghui Wu , Pedro J. Moreno Mengibar , Ron J. Weiss , Khe Chai Sim , Tara N. Sainath , Patrick An Phu Nguyen

IPC: G10L15/00 , G10L15/07 , G10L15/16

CPC classification number: G10L15/005 , G10L15/07 , G10L15/16 , G10L2015/0631

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer-readable media, for speech recognition using multi-dialect and multilingual models. In some implementations, audio data indicating audio characteristics of an utterance is received. Input features determined based on the audio data are provided to a speech recognition model that has been trained to output score indicating the likelihood of linguistic units for each of multiple different language or dialects. The speech recognition model can be one that has been trained using cluster adaptive training. Output that the speech recognition model generated in response to receiving the input features determined based on the audio data is received. A transcription of the utterance generated based on the output of the speech recognition model is provided.

7.

发明授权
Automated calling system 有权

公开(公告)号：US11741966B2

公开(公告)日：2023-08-29

申请号：US17964141

申请日：2022-10-12

Applicant: GOOGLE LLC

Inventor： Asaf Aharoni , Arun Narayanan , Nir Shabat , Parisa Haghani , Galen Tsai Chuang , Yaniv Leviathan , Neeraj Gaur , Pedro J. Moreno Mengibar , Rohit Prakash Prabhavalkar , Zhongdi Qu , Austin Severn Waters , Tomer Amiaz , Michiel A. U. Bacchiani

IPC: G10L15/26 , H04M3/428 , H04M1/663 , G10L15/32 , H04M3/51 , H04M1/02

CPC classification number: G10L15/26 , G10L15/32 , H04M1/02 , H04M1/663 , H04M3/4286 , H04M3/5191

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for an automated calling system are disclosed. In one aspect, a method includes the actions of receiving audio data of an utterance spoken by a user who is having a telephone conversation with a bot. The actions further include determining a context of the telephone conversation. The actions further include determining a user intent of a first previous portion of the telephone conversation spoken by the user and a bot intent of a second previous portion of the telephone conversation outputted by a speech synthesizer of the bot. The actions further include, based on the audio data of the utterance, the context of the telephone conversation, the user intent, and the bot intent, generating synthesized speech of a reply by the bot to the utterance. The actions further include, providing, for output, the synthesized speech.

8.

发明申请
Multilingual Re-Scoring Models for Automatic Speech Recognition 有权

公开(公告)号：US20220310081A1

公开(公告)日：2022-09-29

申请号：US17701635

申请日：2022-03-22

Applicant: Google LLC

Inventor： Neeraj Gaur , Tongzhou Chen , Ehsan Variani , Bhuvana Ramabhadran , Parisa Haghani , Pedro J. Moreno Mengibar

IPC: G10L15/197 , G10L15/16 , G10L15/22 , G10L15/00

Abstract: A method includes receiving a sequence of acoustic frames extracted from audio data corresponding to an utterance. During a first pass, the method includes processing the sequence of acoustic frames to generate N candidate hypotheses for the utterance. During a second pass, and for each candidate hypothesis, the method includes generating a respective un-normalized likelihood score; generating a respective external language model score; generating a standalone score that models prior statistics of the corresponding candidate hypothesis, and generating a respective overall score for the candidate hypothesis based on the un-normalized likelihood score, the external language model score, and the standalone score. The method also includes selecting the candidate hypothesis having the highest respective overall score from among the N candidate hypotheses as a final transcription of the utterance.

9.

发明授权
Voice recognition system 有权

公开(公告)号：US11410660B2

公开(公告)日：2022-08-09

申请号：US16837250

申请日：2020-04-01

Applicant: Google LLC

Inventor： Petar Aleksic , Pedro J. Moreno Mengibar

IPC: G06F16/00 , G06F16/33 , G10L15/06 , G10L15/26 , G06F16/632 , G10L15/19 , G10L15/197 , G10L15/04 , G10L15/08 , G10L15/22 , G10L15/183

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for voice recognition. In one aspect, a method includes the actions of receiving a voice input; determining a transcription for the voice input, wherein determining the transcription for the voice input includes, for a plurality of segments of the voice input: obtaining a first candidate transcription for a first segment of the voice input; determining one or more contexts associated with the first candidate transcription; adjusting a respective weight for each of the one or more contexts; and determining a second candidate transcription for a second segment of the voice input based in part on the adjusted weights; and providing the transcription of the plurality of segments of the voice input for output.

10.

发明授权
Contextual tagging and biasing of grammars inside word lattices 有权

公开(公告)号：US11386889B2

公开(公告)日：2022-07-12

申请号：US16698280

申请日：2019-11-27

Applicant: Google LLC

Inventor： Petar Aleksic , Pedro J. Moreno Mengibar , Leonid Velikovich

IPC: G10L15/197 , G10L15/16 , G10L15/18 , G10L15/187

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for implementing contextual grammar selection are disclosed. In one aspect, a method includes the actions of receiving audio data of an utterance. The actions include generating a word lattice that includes multiple candidate transcriptions of the utterance and that includes transcription confidence scores. The actions include determining a context of the computing device. The actions include based on the context of the computing device, identifying grammars that correspond to the multiple candidate transcriptions. The actions include determining, for each of the multiple candidate transcriptions, grammar confidence scores that reflect a likelihood that a respective grammar is a match for a respective candidate transcription. The actions include selecting, from among the candidate transcriptions, a candidate transcription. The actions further include providing, for output, the selected candidate transcription as a transcription of the utterance.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification