专利检索 ap:("Google LLC") AND inv:"Ignacio Lopez Moreno" 第 1 页

1.

发明授权
Neural networks for speaker verification 有权

公开(公告)号：US11961525B2

公开(公告)日：2024-04-16

申请号：US17444384

申请日：2021-08-03

申请人： Google LLC

发明人： Georg Heigold , Samuel Bengio , Ignacio Lopez Moreno

IPC分类号： G10L17/18 , G10L17/02 , G10L17/04

CPC分类号： G10L17/18 , G10L17/02 , G10L17/04

摘要： This document generally describes systems, methods, devices, and other techniques related to speaker verification, including (i) training a neural network for a speaker verification model, (ii) enrolling users at a client device, and (iii) verifying identities of users based on characteristics of the users' voices. Some implementations include a computer-implemented method. The method can include receiving, at a computing device, data that characterizes an utterance of a user of the computing device. A speaker representation can be generated, at the computing device, for the utterance using a neural network on the computing device. The neural network can be trained based on a plurality of training samples that each: (i) include data that characterizes a first utterance and data that characterizes one or more second utterances, and (ii) are labeled as a matching speakers sample or a non-matching speakers sample.

2.

发明授权
Recognizing speech in the presence of additional audio 有权

公开(公告)号：US11942083B2

公开(公告)日：2024-03-26

申请号：US17303139

申请日：2021-05-21

申请人： Google LLC

发明人： Diego Melendo Casado , Ignacio Lopez Moreno , Javier Gonzalez-Dominguez

IPC分类号： G10L15/00 , G06F3/16 , G10L15/20 , G10L15/22 , G10L17/06 , G10L21/034 , G10L25/84 , H03G3/30 , G10L15/26 , G10L17/00

CPC分类号： G10L15/20 , G06F3/165 , G06F3/167 , G10L15/222 , G10L17/06 , G10L21/034 , G10L25/84 , H03G3/3005 , G10L15/26 , G10L17/00

摘要： The technology described in this document can be embodied in a computer-implemented method that includes receiving, at a processing system, a first signal including an output of a speaker device and an additional audio signal. The method also includes determining, by the processing system, based at least in part on a model trained to identify the output of the speaker device, that the additional audio signal corresponds to an utterance of a user. The method further includes initiating a reduction in an audio output level of the speaker device based on determining that the additional audio signal corresponds to the utterance of the user.

3.

发明公开
ATTENTIVE SCORING FUNCTION FOR SPEAKER IDENTIFICATION 审中-公开

公开(公告)号：US20240029742A1

公开(公告)日：2024-01-25

申请号：US18479615

申请日：2023-10-02

申请人： Google LLC

发明人： Ignacio Lopez Moreno , Quan Wang , Jason Pelecanos , Yiling Huang , Mert Saglam

IPC分类号： G10L17/06 , G06F16/245 , G06N3/08 , G10L17/04 , G10L17/18

CPC分类号： G10L17/06 , G06F16/245 , G06N3/08 , G10L17/04 , G10L17/18

摘要： A speaker verification method includes receiving audio data corresponding to an utterance, processing the audio data to generate a reference attentive d-vector representing voice characteristics of the utterance, the evaluation ad-vector includes ne style classes each including a respective value vector concatenated with a corresponding routing vector. The method also includes generating using a self-attention mechanism, at least one multi-condition attention score that indicates a likelihood that the evaluation ad-vector matches a respective reference ad-vector associated with a respective user. The method also includes identifying the speaker of the utterance as the respective user associated with the respective reference ad-vector based on the multi-condition attention score.

4.

发明授权
Multi-user authentication on a device 有权

公开(公告)号：US11727918B2

公开(公告)日：2023-08-15

申请号：US17375573

申请日：2021-07-14

申请人： GOOGLE LLC

发明人： Ignacio Lopez Moreno , Diego Melendo Casado

IPC分类号： G10L15/08 , G06F21/32 , G10L17/06 , G06F16/635 , G10L15/22 , G10L17/00 , G06V40/10 , G10L15/07 , G10L15/26

CPC分类号： G10L15/08 , G06F16/636 , G06F21/32 , G06V40/10 , G10L15/07 , G10L15/22 , G10L17/00 , G10L17/06 , G10L15/26 , G10L2015/088

摘要： In some implementations, a set of audio recordings capturing utterances of a user is received by a first speech-enabled device. Based on the set of audio recordings, the first speech-enabled device generates a first user voice recognition model for use in subsequently recognizing a voice of the user at the first speech-enabled device. Further, a particular user account associated with the first voice recognition model is determined, and an indication that a second speech-enabled device that is associated with the particular user account is received. In response to receiving the indication, the set of audio recordings is provided to the second speech-enabled device. Based on the set of audio recordings, the second speech-enabled device generates a second user voice recognition model for use in subsequently recognizing the voice of the user at the second speech-enabled device.

5.

发明申请
TEXT INDEPENDENT SPEAKER RECOGNITION 有权

公开(公告)号：US20230113617A1

公开(公告)日：2023-04-13

申请号：US18078476

申请日：2022-12-09

申请人： GOOGLE LLC

发明人： Pu-sen Chao , Diego Melendo Casado , Ignacio Lopez Moreno , Quan Wang

IPC分类号： G10L15/06 , G10L15/07 , G10L15/22 , G10L15/32 , G10L17/24

摘要： Text independent speaker recognition models can be utilized by an automated assistant to verify a particular user spoke a spoken utterance and/or to identify the user who spoke a spoken utterance. Implementations can include automatically updating a speaker embedding for a particular user based on previous utterances by the particular user. Additionally or alternatively, implementations can include verifying a particular user spoke a spoken utterance using output generated by both a text independent speaker recognition model as well as a text dependent speaker recognition model. Furthermore, implementations can additionally or alternatively include prefetching content for several users associated with a spoken utterance prior to determining which user spoke the spoken utterance.

6.

发明授权
Text independent speaker recognition 有权

公开(公告)号：US11527235B2

公开(公告)日：2022-12-13

申请号：US17046994

申请日：2019-12-02

申请人： Google LLC

发明人： Pu-sen Chao , Diego Melendo Casado , Ignacio Lopez Moreno , Quan Wang

IPC分类号： G10L15/06 , G10L15/07 , G10L15/22 , G10L15/32 , G10L17/24

摘要： Text independent speaker recognition models can be utilized by an automated assistant to verify a particular user spoke a spoken utterance and/or to identify the user who spoke a spoken utterance. Implementations can include automatically updating a speaker embedding for a particular user based on previous utterances by the particular user. Additionally or alternatively, implementations can include verifying a particular user spoke a spoken utterance using output generated by both a text independent speaker recognition model as well as a text dependent speaker recognition model. Furthermore, implementations can additionally or alternatively include prefetching content for several users associated with a spoken utterance prior to determining which user spoke the spoken utterance.

7.

发明申请
NOISY STUDENT TEACHER TRAINING FOR ROBUST KEYWORD SPOTTING 有权

公开(公告)号：US20220284891A1

公开(公告)日：2022-09-08

申请号：US17190779

申请日：2021-03-03

申请人： GOOGLE LLC

发明人： Hyun Jin Park , Pai Zhu , Ignacio Lopez Moreno , Niranjan Subrahmanya

IPC分类号： G10L15/22 , G10L15/06 , G10L15/08 , G06K9/62 , G10L21/0208

摘要： Teacher-student learning can be used to train a keyword spotting (KWS) model using augmented training instance(s). Various implementations include aggressively augmenting (e.g., using spectral augmentation) base audio data to generate augmented audio data, where one or more portions of the base instance of audio data can be masked in the augmented instance of audio data (e.g., one or more time frames can be masked, one or more frequencies can be masked, etc.). Many implementations include processing augmented audio data using a KWS teacher model to generate a soft label, and processing the augmented audio data using a KWS student model to generate predicted output. One or more portions of the KWS student model can be updated based on a comparison of the soft label and the generated predicted output.

8.

发明申请
TARGETED VOICE SEPARATION BY SPEAKER CONDITIONED ON SPECTROGRAM MASKING 有权

公开(公告)号：US20220122611A1

公开(公告)日：2022-04-21

申请号：US17567590

申请日：2022-01-03

申请人： GOOGLE LLC

发明人： Quan Wang , Prashant Sridhar , Ignacio Lopez Moreno , Hannah Muckenhirn

IPC分类号： G10L17/04 , G10L17/22 , G10L25/18 , G10L17/02 , G10L17/18 , G10L17/00

摘要： Techniques are disclosed that enable processing of audio data to generate one or more refined versions of audio data, where each of the refined versions of audio data isolate one or more utterances of a single respective human speaker. Various implementations generate a refined version of audio data that isolates utterance(s) of a single human speaker by processing a spectrogram representation of the audio data (generated by processing the audio data with a frequency transformation) using a mask generated by processing the spectrogram of the audio data and a speaker embedding for the single human speaker using a trained voice filter model. Output generated over the trained voice filter model is processed using an inverse of the frequency transformation to generate the refined audio data.

9.

发明授权
Multi-user authentication on a device 有权

公开(公告)号：US11238848B2

公开(公告)日：2022-02-01

申请号：US16709132

申请日：2019-12-10

申请人： Google LLC

发明人： Meltem Oktem , Taral Pradeep Joglekar , Fnu Heryandi , Pu-sen Chao , Ignacio Lopez Moreno , Salil Rajadhyaksha , Alexander H. Gruenstein , Diego Melendo Casado

IPC分类号： G10L15/08 , G06F21/32 , G10L17/06 , G06F16/635 , G10L15/22 , G10L17/00 , G06K9/00 , G10L15/07 , G10L15/26

摘要： In some implementations, authentication tokens corresponding to known users of a device are stored on the device. An utterance from a speaker is received. The speaker of the utterance is classified as not a known user of the device. A query that includes the authentication tokens that correspond to known users of the device, a representation of the utterance, and an indication that the speaker was classified as not a known user of the device is provided to the server. A response to the query is received at the device and from the server based on the query.

10.

发明授权
Speaker awareness using speaker dependent speech model(s) 有权

公开(公告)号：US11238847B2

公开(公告)日：2022-02-01

申请号：US17251163

申请日：2019-12-04

申请人： GOOGLE LLC

发明人： Ignacio Lopez Moreno , Quan Wang , Jason Pelecanos , Li Wan , Alexander Gruenstein , Hakan Erdogan

IPC分类号： G10L17/00 , G10L15/06 , G10L15/07 , G10L15/20 , G10L17/04 , G10L17/20 , G10L21/0208 , G10L15/08

摘要： Techniques disclosed herein enable training and/or utilizing speaker dependent (SD) speech models which are personalizable to any user of a client device. Various implementations include personalizing a SD speech model for a target user by processing, using the SD speech model, a speaker embedding corresponding to the target user along with an instance of audio data. The SD speech model can be personalized for an additional target user by processing, using the SD speech model, an additional speaker embedding, corresponding to the additional target user, along with another instance of audio data. Additional or alternative implementations include training the SD speech model based on a speaker independent speech model using teacher student learning.

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类