-
公开(公告)号:US11961525B2
公开(公告)日:2024-04-16
申请号:US17444384
申请日:2021-08-03
申请人: Google LLC
摘要: This document generally describes systems, methods, devices, and other techniques related to speaker verification, including (i) training a neural network for a speaker verification model, (ii) enrolling users at a client device, and (iii) verifying identities of users based on characteristics of the users' voices. Some implementations include a computer-implemented method. The method can include receiving, at a computing device, data that characterizes an utterance of a user of the computing device. A speaker representation can be generated, at the computing device, for the utterance using a neural network on the computing device. The neural network can be trained based on a plurality of training samples that each: (i) include data that characterizes a first utterance and data that characterizes one or more second utterances, and (ii) are labeled as a matching speakers sample or a non-matching speakers sample.
-
公开(公告)号:US11942083B2
公开(公告)日:2024-03-26
申请号:US17303139
申请日:2021-05-21
申请人: Google LLC
IPC分类号: G10L15/00 , G06F3/16 , G10L15/20 , G10L15/22 , G10L17/06 , G10L21/034 , G10L25/84 , H03G3/30 , G10L15/26 , G10L17/00
CPC分类号: G10L15/20 , G06F3/165 , G06F3/167 , G10L15/222 , G10L17/06 , G10L21/034 , G10L25/84 , H03G3/3005 , G10L15/26 , G10L17/00
摘要: The technology described in this document can be embodied in a computer-implemented method that includes receiving, at a processing system, a first signal including an output of a speaker device and an additional audio signal. The method also includes determining, by the processing system, based at least in part on a model trained to identify the output of the speaker device, that the additional audio signal corresponds to an utterance of a user. The method further includes initiating a reduction in an audio output level of the speaker device based on determining that the additional audio signal corresponds to the utterance of the user.
-
公开(公告)号:US20240029742A1
公开(公告)日:2024-01-25
申请号:US18479615
申请日:2023-10-02
申请人: Google LLC
发明人: Ignacio Lopez Moreno , Quan Wang , Jason Pelecanos , Yiling Huang , Mert Saglam
IPC分类号: G10L17/06 , G06F16/245 , G06N3/08 , G10L17/04 , G10L17/18
CPC分类号: G10L17/06 , G06F16/245 , G06N3/08 , G10L17/04 , G10L17/18
摘要: A speaker verification method includes receiving audio data corresponding to an utterance, processing the audio data to generate a reference attentive d-vector representing voice characteristics of the utterance, the evaluation ad-vector includes ne style classes each including a respective value vector concatenated with a corresponding routing vector. The method also includes generating using a self-attention mechanism, at least one multi-condition attention score that indicates a likelihood that the evaluation ad-vector matches a respective reference ad-vector associated with a respective user. The method also includes identifying the speaker of the utterance as the respective user associated with the respective reference ad-vector based on the multi-condition attention score.
-
公开(公告)号:US11727918B2
公开(公告)日:2023-08-15
申请号:US17375573
申请日:2021-07-14
申请人: GOOGLE LLC
IPC分类号: G10L15/08 , G06F21/32 , G10L17/06 , G06F16/635 , G10L15/22 , G10L17/00 , G06V40/10 , G10L15/07 , G10L15/26
CPC分类号: G10L15/08 , G06F16/636 , G06F21/32 , G06V40/10 , G10L15/07 , G10L15/22 , G10L17/00 , G10L17/06 , G10L15/26 , G10L2015/088
摘要: In some implementations, a set of audio recordings capturing utterances of a user is received by a first speech-enabled device. Based on the set of audio recordings, the first speech-enabled device generates a first user voice recognition model for use in subsequently recognizing a voice of the user at the first speech-enabled device. Further, a particular user account associated with the first voice recognition model is determined, and an indication that a second speech-enabled device that is associated with the particular user account is received. In response to receiving the indication, the set of audio recordings is provided to the second speech-enabled device. Based on the set of audio recordings, the second speech-enabled device generates a second user voice recognition model for use in subsequently recognizing the voice of the user at the second speech-enabled device.
-
公开(公告)号:US20230113617A1
公开(公告)日:2023-04-13
申请号:US18078476
申请日:2022-12-09
申请人: GOOGLE LLC
摘要: Text independent speaker recognition models can be utilized by an automated assistant to verify a particular user spoke a spoken utterance and/or to identify the user who spoke a spoken utterance. Implementations can include automatically updating a speaker embedding for a particular user based on previous utterances by the particular user. Additionally or alternatively, implementations can include verifying a particular user spoke a spoken utterance using output generated by both a text independent speaker recognition model as well as a text dependent speaker recognition model. Furthermore, implementations can additionally or alternatively include prefetching content for several users associated with a spoken utterance prior to determining which user spoke the spoken utterance.
-
公开(公告)号:US11527235B2
公开(公告)日:2022-12-13
申请号:US17046994
申请日:2019-12-02
申请人: Google LLC
摘要: Text independent speaker recognition models can be utilized by an automated assistant to verify a particular user spoke a spoken utterance and/or to identify the user who spoke a spoken utterance. Implementations can include automatically updating a speaker embedding for a particular user based on previous utterances by the particular user. Additionally or alternatively, implementations can include verifying a particular user spoke a spoken utterance using output generated by both a text independent speaker recognition model as well as a text dependent speaker recognition model. Furthermore, implementations can additionally or alternatively include prefetching content for several users associated with a spoken utterance prior to determining which user spoke the spoken utterance.
-
公开(公告)号:US20220284891A1
公开(公告)日:2022-09-08
申请号:US17190779
申请日:2021-03-03
申请人: GOOGLE LLC
IPC分类号: G10L15/22 , G10L15/06 , G10L15/08 , G06K9/62 , G10L21/0208
摘要: Teacher-student learning can be used to train a keyword spotting (KWS) model using augmented training instance(s). Various implementations include aggressively augmenting (e.g., using spectral augmentation) base audio data to generate augmented audio data, where one or more portions of the base instance of audio data can be masked in the augmented instance of audio data (e.g., one or more time frames can be masked, one or more frequencies can be masked, etc.). Many implementations include processing augmented audio data using a KWS teacher model to generate a soft label, and processing the augmented audio data using a KWS student model to generate predicted output. One or more portions of the KWS student model can be updated based on a comparison of the soft label and the generated predicted output.
-
公开(公告)号:US20220122611A1
公开(公告)日:2022-04-21
申请号:US17567590
申请日:2022-01-03
申请人: GOOGLE LLC
摘要: Techniques are disclosed that enable processing of audio data to generate one or more refined versions of audio data, where each of the refined versions of audio data isolate one or more utterances of a single respective human speaker. Various implementations generate a refined version of audio data that isolates utterance(s) of a single human speaker by processing a spectrogram representation of the audio data (generated by processing the audio data with a frequency transformation) using a mask generated by processing the spectrogram of the audio data and a speaker embedding for the single human speaker using a trained voice filter model. Output generated over the trained voice filter model is processed using an inverse of the frequency transformation to generate the refined audio data.
-
公开(公告)号:US11238848B2
公开(公告)日:2022-02-01
申请号:US16709132
申请日:2019-12-10
申请人: Google LLC
发明人: Meltem Oktem , Taral Pradeep Joglekar , Fnu Heryandi , Pu-sen Chao , Ignacio Lopez Moreno , Salil Rajadhyaksha , Alexander H. Gruenstein , Diego Melendo Casado
IPC分类号: G10L15/08 , G06F21/32 , G10L17/06 , G06F16/635 , G10L15/22 , G10L17/00 , G06K9/00 , G10L15/07 , G10L15/26
摘要: In some implementations, authentication tokens corresponding to known users of a device are stored on the device. An utterance from a speaker is received. The speaker of the utterance is classified as not a known user of the device. A query that includes the authentication tokens that correspond to known users of the device, a representation of the utterance, and an indication that the speaker was classified as not a known user of the device is provided to the server. A response to the query is received at the device and from the server based on the query.
-
公开(公告)号:US11238847B2
公开(公告)日:2022-02-01
申请号:US17251163
申请日:2019-12-04
申请人: GOOGLE LLC
发明人: Ignacio Lopez Moreno , Quan Wang , Jason Pelecanos , Li Wan , Alexander Gruenstein , Hakan Erdogan
IPC分类号: G10L17/00 , G10L15/06 , G10L15/07 , G10L15/20 , G10L17/04 , G10L17/20 , G10L21/0208 , G10L15/08
摘要: Techniques disclosed herein enable training and/or utilizing speaker dependent (SD) speech models which are personalizable to any user of a client device. Various implementations include personalizing a SD speech model for a target user by processing, using the SD speech model, a speaker embedding corresponding to the target user along with an instance of audio data. The SD speech model can be personalized for an additional target user by processing, using the SD speech model, an additional speaker embedding, corresponding to the additional target user, along with another instance of audio data. Additional or alternative implementations include training the SD speech model based on a speaker independent speech model using teacher student learning.
-
-
-
-
-
-
-
-
-