Neural networks for speaker verification

    公开(公告)号:US11961525B2

    公开(公告)日:2024-04-16

    申请号:US17444384

    申请日:2021-08-03

    申请人: Google LLC

    IPC分类号: G10L17/18 G10L17/02 G10L17/04

    CPC分类号: G10L17/18 G10L17/02 G10L17/04

    摘要: This document generally describes systems, methods, devices, and other techniques related to speaker verification, including (i) training a neural network for a speaker verification model, (ii) enrolling users at a client device, and (iii) verifying identities of users based on characteristics of the users' voices. Some implementations include a computer-implemented method. The method can include receiving, at a computing device, data that characterizes an utterance of a user of the computing device. A speaker representation can be generated, at the computing device, for the utterance using a neural network on the computing device. The neural network can be trained based on a plurality of training samples that each: (i) include data that characterizes a first utterance and data that characterizes one or more second utterances, and (ii) are labeled as a matching speakers sample or a non-matching speakers sample.

    ATTENTIVE SCORING FUNCTION FOR SPEAKER IDENTIFICATION

    公开(公告)号:US20240029742A1

    公开(公告)日:2024-01-25

    申请号:US18479615

    申请日:2023-10-02

    申请人: Google LLC

    摘要: A speaker verification method includes receiving audio data corresponding to an utterance, processing the audio data to generate a reference attentive d-vector representing voice characteristics of the utterance, the evaluation ad-vector includes ne style classes each including a respective value vector concatenated with a corresponding routing vector. The method also includes generating using a self-attention mechanism, at least one multi-condition attention score that indicates a likelihood that the evaluation ad-vector matches a respective reference ad-vector associated with a respective user. The method also includes identifying the speaker of the utterance as the respective user associated with the respective reference ad-vector based on the multi-condition attention score.

    TEXT INDEPENDENT SPEAKER RECOGNITION

    公开(公告)号:US20230113617A1

    公开(公告)日:2023-04-13

    申请号:US18078476

    申请日:2022-12-09

    申请人: GOOGLE LLC

    摘要: Text independent speaker recognition models can be utilized by an automated assistant to verify a particular user spoke a spoken utterance and/or to identify the user who spoke a spoken utterance. Implementations can include automatically updating a speaker embedding for a particular user based on previous utterances by the particular user. Additionally or alternatively, implementations can include verifying a particular user spoke a spoken utterance using output generated by both a text independent speaker recognition model as well as a text dependent speaker recognition model. Furthermore, implementations can additionally or alternatively include prefetching content for several users associated with a spoken utterance prior to determining which user spoke the spoken utterance.

    Text independent speaker recognition

    公开(公告)号:US11527235B2

    公开(公告)日:2022-12-13

    申请号:US17046994

    申请日:2019-12-02

    申请人: Google LLC

    摘要: Text independent speaker recognition models can be utilized by an automated assistant to verify a particular user spoke a spoken utterance and/or to identify the user who spoke a spoken utterance. Implementations can include automatically updating a speaker embedding for a particular user based on previous utterances by the particular user. Additionally or alternatively, implementations can include verifying a particular user spoke a spoken utterance using output generated by both a text independent speaker recognition model as well as a text dependent speaker recognition model. Furthermore, implementations can additionally or alternatively include prefetching content for several users associated with a spoken utterance prior to determining which user spoke the spoken utterance.

    NOISY STUDENT TEACHER TRAINING FOR ROBUST KEYWORD SPOTTING

    公开(公告)号:US20220284891A1

    公开(公告)日:2022-09-08

    申请号:US17190779

    申请日:2021-03-03

    申请人: GOOGLE LLC

    摘要: Teacher-student learning can be used to train a keyword spotting (KWS) model using augmented training instance(s). Various implementations include aggressively augmenting (e.g., using spectral augmentation) base audio data to generate augmented audio data, where one or more portions of the base instance of audio data can be masked in the augmented instance of audio data (e.g., one or more time frames can be masked, one or more frequencies can be masked, etc.). Many implementations include processing augmented audio data using a KWS teacher model to generate a soft label, and processing the augmented audio data using a KWS student model to generate predicted output. One or more portions of the KWS student model can be updated based on a comparison of the soft label and the generated predicted output.

    TARGETED VOICE SEPARATION BY SPEAKER CONDITIONED ON SPECTROGRAM MASKING

    公开(公告)号:US20220122611A1

    公开(公告)日:2022-04-21

    申请号:US17567590

    申请日:2022-01-03

    申请人: GOOGLE LLC

    摘要: Techniques are disclosed that enable processing of audio data to generate one or more refined versions of audio data, where each of the refined versions of audio data isolate one or more utterances of a single respective human speaker. Various implementations generate a refined version of audio data that isolates utterance(s) of a single human speaker by processing a spectrogram representation of the audio data (generated by processing the audio data with a frequency transformation) using a mask generated by processing the spectrogram of the audio data and a speaker embedding for the single human speaker using a trained voice filter model. Output generated over the trained voice filter model is processed using an inverse of the frequency transformation to generate the refined audio data.