Patent search ap:("GOOGLE LLC") AND inv:"Petar Aleksic" Page 7

61.

发明公开
VOICE TO TEXT CONVERSION BASED ON THIRD-PARTY AGENT CONTENT 审中-公开

公开(公告)号：US20230260517A1

公开(公告)日：2023-08-17

申请号：US18125606

申请日：2023-03-23

Applicant: GOOGLE LLC

Inventor： Barnaby James , Bo Wang , Sunil Vemuri , David Schairer , Ulas Kirazci , Ertan Dogrultan , Petar Aleksic

IPC: G10L15/26 , G10L15/22 , G06F40/284 , G06F40/205 , G06F40/30 , G10L15/183 , G10L15/18 , G10L15/30

CPC classification number: G10L15/26 , G06F40/30 , G06F40/205 , G06F40/284 , G10L15/22 , G10L15/30 , G10L15/183 , G10L15/1815 , G10L2015/223 , G10L2015/228

Abstract: Implementations relate to dynamically, and in a context-sensitive manner, biasing voice to text conversion. In some implementations, the biasing of voice to text conversions is performed by a voice to text engine of a local agent, and the biasing is based at least in part on content provided to the local agent by a third-party (3P) agent that is in network communication with the local agent. In some of those implementations, the content includes contextual parameters that are provided by the 3P agent in combination with responsive content generated by the 3P agent during a dialog that: is between the 3P agent, and a user of a voice-enabled electronic device; and is facilitated by the local agent. The contextual parameters indicate potential feature(s) of further voice input that is to be provided in response to the responsive content generated by the 3P agent.

62.

发明申请
MIXED MODEL SPEECH RECOGNITION 有权

公开(公告)号：US20220262365A1

公开(公告)日：2022-08-18

申请号：US17661837

申请日：2022-05-03

Applicant: Google LLC

Inventor： Alexander H. Gruenstein , Petar Aleksic

IPC: G10L15/26 , G10L15/18 , G10L15/22 , G10L15/32 , G10L15/30

Abstract: In one aspect, a method comprises accessing audio data generated by a computing device based on audio input from a user, the audio data encoding one or more user utterances. The method further comprises generating a first transcription of the utterances by performing speech recognition on the audio data using a first speech recognizer that employs a language model based on user-specific data. The method further comprises generating a second transcription of the utterances by performing speech recognition on the audio data using a second speech recognizer that employs a language model independent of user-specific data. The method further comprises determining that the second transcription of the utterances includes a term from a predefined set of one or more terms. The method further comprises, based on determining that the second transcription of the utterance includes the term, providing an output of the first transcription of the utterance.

63.

发明授权
Voice recognition system 有权

公开(公告)号：US11410660B2

公开(公告)日：2022-08-09

申请号：US16837250

申请日：2020-04-01

Applicant: Google LLC

Inventor： Petar Aleksic , Pedro J. Moreno Mengibar

IPC: G06F16/00 , G06F16/33 , G10L15/06 , G10L15/26 , G06F16/632 , G10L15/19 , G10L15/197 , G10L15/04 , G10L15/08 , G10L15/22 , G10L15/183

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for voice recognition. In one aspect, a method includes the actions of receiving a voice input; determining a transcription for the voice input, wherein determining the transcription for the voice input includes, for a plurality of segments of the voice input: obtaining a first candidate transcription for a first segment of the voice input; determining one or more contexts associated with the first candidate transcription; adjusting a respective weight for each of the one or more contexts; and determining a second candidate transcription for a second segment of the voice input based in part on the adjusted weights; and providing the transcription of the plurality of segments of the voice input for output.

64.

发明授权
Contextual tagging and biasing of grammars inside word lattices 有权

公开(公告)号：US11386889B2

公开(公告)日：2022-07-12

申请号：US16698280

申请日：2019-11-27

Applicant: Google LLC

Inventor： Petar Aleksic , Pedro J. Moreno Mengibar , Leonid Velikovich

IPC: G10L15/197 , G10L15/16 , G10L15/18 , G10L15/187

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for implementing contextual grammar selection are disclosed. In one aspect, a method includes the actions of receiving audio data of an utterance. The actions include generating a word lattice that includes multiple candidate transcriptions of the utterance and that includes transcription confidence scores. The actions include determining a context of the computing device. The actions include based on the context of the computing device, identifying grammars that correspond to the multiple candidate transcriptions. The actions include determining, for each of the multiple candidate transcriptions, grammar confidence scores that reflect a likelihood that a respective grammar is a match for a respective candidate transcription. The actions include selecting, from among the candidate transcriptions, a candidate transcription. The actions further include providing, for output, the selected candidate transcription as a transcription of the utterance.

65.

发明授权
Negative n-gram biasing 有权

公开(公告)号：US11282513B2

公开(公告)日：2022-03-22

申请号：US16902246

申请日：2020-06-15

Applicant: Google LLC

Inventor： Pedro J. Moreno Mengibar , Petar Aleksic

IPC: G10L15/22 , G10L15/01 , G10L15/197

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing dynamic, stroke-based alignment of touch displays. In one aspect, a method includes obtaining a candidate transcription that an automated speech recognizer generates for an utterance, determining a particular context associated with the utterance, determining that a particular n-gram that is included in the candidate transcription is included among a set of undesirable n-grams that is associated with the context, adjusting a speech recognition confidence score associated with the transcription based on determining that the particular n-gram that is included in the candidate transcription is included among the set of undesirable n-grams that is associated with the context, and determining whether to provide the candidate transcription for output based at least on the adjusted speech recognition confidence score.

66.

发明授权
Determining dialog states for language models 有权

公开(公告)号：US11264028B2

公开(公告)日：2022-03-01

申请号：US16732645

申请日：2020-01-02

Applicant: Google LLC

Inventor： Petar Aleksic , Pedro J. Moreno Mengibar

IPC: G10L15/22 , G10L15/26 , G06F40/30 , G06F40/295 , G10L15/065 , G10L15/197 , G10L15/183

Abstract: Systems, methods, devices, and other techniques are described herein for determining dialog states that correspond to voice inputs and for biasing a language model based on the determined dialog states. In some implementations, a method includes receiving, at a computing system, audio data that indicates a voice input and determining a particular dialog state, from among a plurality of dialog states, which corresponds to the voice input. A set of n-grams can be identified that are associated with the particular dialog state that corresponds to the voice input. In response to identifying the set of n-grams that are associated with the particular dialog state that corresponds to the voice input, a language model can be biased by adjusting probability scores that the language model indicates for n-grams in the set of n-grams. The voice input can be transcribed using the adjusted language model.

67.

发明授权
Voice to text conversion based on third-party agent content 有权

公开(公告)号：US11232797B2

公开(公告)日：2022-01-25

申请号：US16791334

申请日：2020-02-14

Applicant: Google LLC

Inventor： Barnaby James , Bo Wang , Sunil Vemuri , David Schairer , Ulas Kirazci , Ertan Dogrultan , Petar Aleksic

IPC: G10L15/26 , G10L15/22 , G10L15/18 , G10L15/30

Abstract: Implementations relate to dynamically, and in a context-sensitive manner, biasing voice to text conversion. In some implementations, the biasing of voice to text conversions is performed by a voice to text engine of a local agent, and the biasing is based at least in part on content provided to the local agent by a third-party (3P) agent that is in network communication with the local agent. In some of those implementations, the content includes contextual parameters that are provided by the 3P agent in combination with responsive content generated by the 3P agent during a dialog that: is between the 3P agent, and a user of a voice-enabled electronic device; and is facilitated by the local agent. The contextual parameters indicate potential feature(s) of further voice input that is to be provided in response to the responsive content generated by the 3P agent.

68.

发明申请
ALPHANUMERIC SEQUENCE BIASING FOR AUTOMATIC SPEECH RECOGNITION 有权

公开(公告)号：US20220013126A1

公开(公告)日：2022-01-13

申请号：US17251465

申请日：2020-01-17

Applicant: Google LLC

Inventor： Benjamin Haynor , Petar Aleksic

IPC: G10L15/26 , G10L15/16 , G10L15/193

Abstract: Speech processing techniques are disclosed that enable determining a text representation of alphanumeric sequences in captured audio data. Various implementations include determining a contextual biasing finite state transducer (FST) based on contextual information corresponding to the captured audio data. Additional or alternative implementations include modifying probabilities of one

69.

发明申请
LANGUAGE MODEL BIASING SYSTEM 有权

公开(公告)号：US20210358479A1

公开(公告)日：2021-11-18

申请号：US17337400

申请日：2021-06-02

Applicant: Google LLC

Inventor： Petar Aleksic , Pedro J. Moreno Mengibar

IPC: G10L15/07 , G10L15/187 , G10L15/18 , G10L15/197

Abstract: Methods, systems, and apparatus for receiving audio data corresponding to a user utterance and context data, identifying an initial set of one or more n-grams from the context data, generating an expanded set of one or more n-grams based on the initial set of n-grams, adjusting a language model based at least on the expanded set of n-grams, determining one or more speech recognition candidates for at least a portion of the user utterance using the adjusted language model, adjusting a score for a particular speech recognition candidate determined to be included in the expanded set of n-grams, determining a transcription of user utterance that includes at least one of the one or more speech recognition candidates, and providing the transcription of the user utterance for output.

70.

发明申请
ENHANCED SPEECH ENDPOINTING 有权

公开(公告)号：US20210090554A1

公开(公告)日：2021-03-25

申请号：US17115403

申请日：2020-12-08

Applicant: Google LLC

Inventor： Petar Aleksic , Glen Shires , Michael Buchanan

IPC: G10L15/05 , G10L15/04 , G06F3/16 , G10L15/22 , G10L15/26 , G10L25/78

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for receiving audio data including an utterance, obtaining context data that indicates one or more expected speech recognition results, determining an expected speech recognition result based on the context data, receiving an intermediate speech recognition result generated by a speech recognition engine, comparing the intermediate speech recognition result to the expected speech recognition result for the audio data based on the context data, determining whether the intermediate speech recognition result corresponds to the expected speech recognition result for the audio data based on the context data, and setting an end of speech condition and providing a final speech recognition result in response to determining the intermediate speech recognition result matches the expected speech recognition result, the final speech recognition result including the one or more expected speech recognition results indicated by the context data.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification