-
公开(公告)号:US20210280177A1
公开(公告)日:2021-09-09
申请号:US17328400
申请日:2021-05-24
申请人: Google LLC
IPC分类号: G10L15/197 , G10L15/00 , G10L15/22 , G10L15/30 , G10L15/08 , G10L15/14 , G10L15/18 , G10L13/00
摘要: Determining a language for speech recognition of a spoken utterance received via an automated assistant interface for interacting with an automated assistant. Implementations can enable multilingual interaction with the automated assistant, without necessitating a user explicitly designate a language to be utilized for each interaction. Implementations determine a user profile that corresponds to audio data that captures a spoken utterance, and utilize language(s), and optionally corresponding probabilities, assigned to the user profile in determining a language for speech recognition of the spoken utterance. Some implementations select only a subset of languages, assigned to the user profile, to utilize in speech recognition of a given spoken utterance of the user. Some implementations perform speech recognition in each of multiple languages assigned to the user profile, and utilize criteria to select only one of the speech recognitions as appropriate for generating and providing content that is responsive to the spoken utterance.
-
公开(公告)号:US10831366B2
公开(公告)日:2020-11-10
申请号:US15393676
申请日:2016-12-29
申请人: Google LLC
发明人: Yu Ouyang , Diego Melendo Casado , Mohammadinamul Hasan Sheik , Francoise Beaufays , Dragan Zivkovic , Meltem Oktem
IPC分类号: G06F3/0488 , G06F3/16 , G06F1/16 , G06F3/023 , G06F40/166 , G06F40/289 , G10L15/22
摘要: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for cross input modality learning in a mobile device are disclosed. In one aspect, a method includes activating a first modality user input mode in which user inputs by way of a first modality are recognized using a first modality recognizer; and receiving a user input by way of the first modality. The method includes, obtaining, as a result of the first modality recognizer recognizing the user input, a transcription that includes a particular term; and generating an input context data structure that references at least the particular term. The method further includes, transmitting, by the first modality recognizer, the input context data structure to a second modality recognizer for use in updating a second modality recognition model associated with the second modality recognizer.
-
公开(公告)号:US20190318724A1
公开(公告)日:2019-10-17
申请号:US15973466
申请日:2018-05-07
申请人: Google LLC
摘要: The present disclosure relates generally to determining a language for speech recognition of a spoken utterance, received via an automated assistant interface, for interacting with an automated assistant. The system can enable multilingual interaction with the automated assistant, without necessitating a user explicitly designate a language to be utilized for each interaction. Selection of a speech recognition model for a particular language can based on one or more interaction characteristics exhibited during a dialog session between a user and an automated assistant. Such interaction characteristics can include anticipated user input types, anticipated user input durations, a duration for monitoring for a user response, and/or an actual duration of a provided user response.
-
公开(公告)号:US20180308472A1
公开(公告)日:2018-10-25
申请号:US15956493
申请日:2018-04-18
申请人: Google LLC
CPC分类号: G10L17/06 , G06F17/30764 , G06F21/32 , G06K9/00362 , G10L15/07 , G10L15/08 , G10L15/22 , G10L15/265 , G10L17/005
摘要: In some implementations, an utterance is determined to include a particular user speaking a hotword based at least on a first set of samples of the particular user speaking the hotword. In response to determining that an utterance includes a particular user speaking a hotword based at least on a first set of samples of the particular user speaking the hotword, at least a portion of the utterance is stored as a new sample. A second set of samples of the particular user speaking the utterance is obtained, where the second set of samples includes the new sample and less than all the samples in the first set of samples. A second utterance is determined to include the particular user speaking the hotword based at least on the second set of samples of the user speaking the hotword.
-
公开(公告)号:US20180286406A1
公开(公告)日:2018-10-04
申请号:US15952434
申请日:2018-04-13
申请人: Google LLC
CPC分类号: G10L15/30 , G10L15/16 , G10L15/22 , G10L25/78 , G10L2015/088 , G10L2015/223 , H04L67/10 , H05K999/99
摘要: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for hotword detection on multiple devices are disclosed. In one aspect, a method includes the actions of receiving audio data that corresponds to an utterance. The actions further include determining that the utterance likely includes a particular, predefined hotword. The actions further include transmitting (i) data indicating that the computing device likely received the particular, predefined hotword, (ii) data identifying the computing device, and (iii) data identifying a group of nearby computing devices that includes the computing device. The actions further include receiving an instruction to commence speech recognition processing on the audio data. The actions further include in response to receiving the instruction to commence speech recognition processing on the audio data, processing at least a portion of the audio data using an automated speech recognizer on the computing device.
-
公开(公告)号:US10008207B2
公开(公告)日:2018-06-26
申请号:US15233090
申请日:2016-08-10
申请人: Google LLC
IPC分类号: G10L15/16 , G10L17/10 , G10L15/20 , G10L15/22 , G10L17/02 , G10L17/18 , G10L17/24 , G10L15/08 , G10L25/30 , G10L15/12 , G10L17/00
CPC分类号: G10L17/10 , G10L15/12 , G10L15/16 , G10L15/20 , G10L15/22 , G10L17/00 , G10L17/02 , G10L17/18 , G10L17/24 , G10L25/30 , G10L2015/088 , G10L2015/223
摘要: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for multi-stage hotword detection are disclosed. In one aspect, a method includes the actions of receiving, by a second stage hotword detector of a multi-stage hotword detection system that includes at least a first stage hotword detector and the second stage hotword detector, audio data that corresponds to an initial portion of an utterance. The actions further include determining a likelihood that the initial portion of the utterance includes a hotword. The actions further include determining that the likelihood that the initial portion of the utterance includes the hotword satisfies a threshold. The actions further include, in response to determining that the likelihood satisfies the threshold, transmitting a request for the first stage hotword detector to cease providing additional audio data that corresponds to one or more subsequent portions of the utterance.
-
公开(公告)号:US09972320B2
公开(公告)日:2018-05-15
申请号:US15278269
申请日:2016-09-28
申请人: Google LLC
CPC分类号: G10L15/30 , G10L15/16 , G10L15/22 , G10L25/78 , G10L2015/088 , G10L2015/223 , H04L67/10
摘要: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for hotword detection on multiple devices are disclosed. In one aspect, a method includes the actions of receiving audio data that corresponds to an utterance. The actions further include determining that the utterance likely includes a particular, predefined hotword. The actions further include transmitting (i) data indicating that the computing device likely received the particular, predefined hotword, (ii) data identifying the computing device, and (iii) data identifying a group of nearby computing devices that includes the computing device. The actions further include receiving an instruction to commence speech recognition processing on the audio data. The actions further include in response to receiving the instruction to commence speech recognition processing on the audio data, processing at least a portion of the audio data using an automated speech recognizer on the computing device.
-
公开(公告)号:US11942083B2
公开(公告)日:2024-03-26
申请号:US17303139
申请日:2021-05-21
申请人: Google LLC
IPC分类号: G10L15/00 , G06F3/16 , G10L15/20 , G10L15/22 , G10L17/06 , G10L21/034 , G10L25/84 , H03G3/30 , G10L15/26 , G10L17/00
CPC分类号: G10L15/20 , G06F3/165 , G06F3/167 , G10L15/222 , G10L17/06 , G10L21/034 , G10L25/84 , H03G3/3005 , G10L15/26 , G10L17/00
摘要: The technology described in this document can be embodied in a computer-implemented method that includes receiving, at a processing system, a first signal including an output of a speaker device and an additional audio signal. The method also includes determining, by the processing system, based at least in part on a model trained to identify the output of the speaker device, that the additional audio signal corresponds to an utterance of a user. The method further includes initiating a reduction in an audio output level of the speaker device based on determining that the additional audio signal corresponds to the utterance of the user.
-
公开(公告)号:US20240086063A1
公开(公告)日:2024-03-14
申请号:US18517825
申请日:2023-11-22
申请人: Google LLC
发明人: Yu Ouyang , Diego Melendo Casado , Mohammadinamul Hasan Sheik , Francoise Beaufays , Dragan Zivkovic , Meltem Oktem
IPC分类号: G06F3/04886 , G06F1/16 , G06F3/023 , G06F3/04883 , G06F3/16 , G06F40/166 , G06F40/289
CPC分类号: G06F3/04886 , G06F1/1626 , G06F3/0233 , G06F3/04883 , G06F3/167 , G06F40/166 , G06F40/289 , G06F2203/0381 , G10L15/22
摘要: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for cross input modality learning in a mobile device are disclosed. In one aspect, a method includes activating a first modality user input mode in which user inputs by way of a first modality are recognized using a first modality recognizer; and receiving a user input by way of the first modality. The method includes, obtaining, as a result of the first modality recognizer recognizing the user input, a transcription that includes a particular term; and generating an input context data structure that references at least the particular term. The method further includes, transmitting, by the first modality recognizer, the input context data structure to a second modality recognizer for use in updating a second modality recognition model associated with the second modality recognizer.
-
公开(公告)号:US11727918B2
公开(公告)日:2023-08-15
申请号:US17375573
申请日:2021-07-14
申请人: GOOGLE LLC
IPC分类号: G10L15/08 , G06F21/32 , G10L17/06 , G06F16/635 , G10L15/22 , G10L17/00 , G06V40/10 , G10L15/07 , G10L15/26
CPC分类号: G10L15/08 , G06F16/636 , G06F21/32 , G06V40/10 , G10L15/07 , G10L15/22 , G10L17/00 , G10L17/06 , G10L15/26 , G10L2015/088
摘要: In some implementations, a set of audio recordings capturing utterances of a user is received by a first speech-enabled device. Based on the set of audio recordings, the first speech-enabled device generates a first user voice recognition model for use in subsequently recognizing a voice of the user at the first speech-enabled device. Further, a particular user account associated with the first voice recognition model is determined, and an indication that a second speech-enabled device that is associated with the particular user account is received. In response to receiving the indication, the set of audio recordings is provided to the second speech-enabled device. Based on the set of audio recordings, the second speech-enabled device generates a second user voice recognition model for use in subsequently recognizing the voice of the user at the second speech-enabled device.
-
-
-
-
-
-
-
-
-