Patent search ap:("GOOGLE LLC") AND inv:"Rajeev Rikhye" Page 1

1.

发明授权
Voice shortcut detection with speaker verification 有权

公开(公告)号：US11568878B2

公开(公告)日：2023-01-31

申请号：US17233253

申请日：2021-04-16

Applicant: Google LLC

Inventor： Rajeev Rikhye , Quan Wang , Yanzhang He , Qiao Liang , Ian C. McGraw

IPC: G10L17/24 , G10L17/06 , G10L21/028

Abstract: Techniques disclosed herein are directed towards streaming keyphrase detection which can be customized to detect one or more particular keyphrases, without requiring retraining of any model(s) for those particular keyphrase(s). Many implementations include processing audio data using a speaker separation model to generate separated audio data which isolates an utterance spoken by a human speaker from one or more additional sounds not spoken by the human speaker, and processing the separated audio data using a text independent speaker identification model to determine whether a verified and/or registered user spoke a spoken utterance captured in the audio data. Various implementations include processing the audio data and/or the separated audio data using an automatic speech recognition model to generate a text representation of the utterance. Additionally or alternatively, the text representation of the utterance can be processed to determine whether at least a portion of the text representation of the utterance captures a particular keyphrase. When the system determines the registered and/or verified user spoke the utterance and the system determines the text representation of the utterance captures the particular keyphrase, the system can cause a computing device to perform one or more actions corresponding to the particular keyphrase.

2.

发明公开
VOICE SHORTCUT DETECTION WITH SPEAKER VERIFICATION 审中-公开

公开(公告)号：US20240363122A1

公开(公告)日：2024-10-31

申请号：US18765108

申请日：2024-07-05

Applicant: GOOGLE LLC

Inventor： Rajeev Rikhye , Quan Wang , Yanzhang He , Qiao Liang , Ian C. McGraw

IPC: G10L17/24 , G10L15/26 , G10L17/06 , G10L21/028

CPC classification number: G10L17/24 , G10L15/26 , G10L17/06 , G10L21/028

Abstract: Techniques disclosed herein are directed towards streaming keyphrase detection which can be customized to detect one or more particular keyphrases, without requiring retraining of any model(s) for those particular keyphrase(s). Many implementations include processing audio data using a speaker separation model to generate separated audio data which isolates an utterance spoken by a human speaker from one or more additional sounds not spoken by the human speaker, and processing the separated audio data using a text independent speaker identification model to determine whether a verified and/or registered user spoke a spoken utterance captured in the audio data. Various implementations include processing the audio data and/or the separated audio data using an automatic speech recognition model to generate a text representation of the utterance. Additionally or alternatively, the text representation of the utterance can be processed to determine whether at least a portion of the text representation of the utterance captures a particular keyphrase. When the system determines the registered and/or verified user spoke the utterance and the system determines the text representation of the utterance captures the particular keyphrase, the system can cause a computing device to perform one or more actions corresponding to the particular keyphrase.

3.

发明授权
Voice shortcut detection with speaker verification 有权

公开(公告)号：US12033641B2

公开(公告)日：2024-07-09

申请号：US18103324

申请日：2023-01-30

Applicant: Google LLC

Inventor： Rajeev Rikhye , Quan Wang , Yanzhang He , Qiao Liang , Ian C. McGraw

IPC: G10L17/24 , G10L15/26 , G10L17/06 , G10L21/028

CPC classification number: G10L17/24 , G10L15/26 , G10L17/06 , G10L21/028

Abstract: Techniques disclosed herein are directed towards streaming keyphrase detection which can be customized to detect one or more particular keyphrases, without requiring retraining of any model(s) for those particular keyphrase(s). Many implementations include processing audio data using a speaker separation model to generate separated audio data which isolates an utterance spoken by a human speaker from one or more additional sounds not spoken by the human speaker, and processing the separated audio data using a text independent speaker identification model to determine whether a verified and/or registered user spoke a spoken utterance captured in the audio data. Various implementations include processing the audio data and/or the separated audio data using an automatic speech recognition model to generate a text representation of the utterance. Additionally or alternatively, the text representation of the utterance can be processed to determine whether at least a portion of the text representation of the utterance captures a particular keyphrase. When the system determines the registered and/or verified user spoke the utterance and the system determines the text representation of the utterance captures the particular keyphrase, the system can cause a computing device to perform one or more actions corresponding to the particular keyphrase.

4.

发明公开
Optimizing Personal VAD for On-Device Speech Recognition 审中-公开

公开(公告)号：US20230298591A1

公开(公告)日：2023-09-21

申请号：US18123060

申请日：2023-03-17

Applicant: Google LLC

Inventor： Shaojin Ding , Rajeev Rikhye , Qiao Liang , Yanzhang He , Quan Wang , Arun Narayanan , Tom O'Malley , Ian McGraw

IPC: G10L17/06 , G10L17/22

CPC classification number: G10L17/06 , G10L17/22

Abstract: A computer-implemented method includes receiving a sequence of acoustic frames corresponding to an utterance and generating a reference speaker embedding for the utterance. The method also includes receiving a target speaker embedding for a target speaker and generating feature-wise linear modulation (FiLM) parameters including a scaling vector and a shifting vector based on the target speaker embedding. The method also includes generating an affine transformation output that scales and shifts the reference speaker embedding based on the FiLM parameters. The method also includes generating a classification output indicating whether the utterance was spoken by the target speaker based on the affine transformation output.

5.

发明公开
VOICE SHORTCUT DETECTION WITH SPEAKER VERIFICATION 审中-公开

公开(公告)号：US20230169984A1

公开(公告)日：2023-06-01

申请号：US18103324

申请日：2023-01-30

Applicant: Google LLC

Inventor： Rajeev Rikhye , Quan Wang , Yanzhang He , Qiao Liang , Ian C. McGraw

IPC: G10L17/24 , G10L17/06 , G10L21/028

CPC classification number: G10L17/24 , G10L17/06 , G10L21/028

Abstract: Techniques disclosed herein are directed towards streaming keyphrase detection which can be customized to detect one or more particular keyphrases, without requiring retraining of any model(s) for those particular keyphrase(s). Many implementations include processing audio data using a speaker separation model to generate separated audio data which isolates an utterance spoken by a human speaker from one or more additional sounds not spoken by the human speaker, and processing the separated audio data using a text independent speaker identification model to determine whether a verified and/or registered user spoke a spoken utterance captured in the audio data. Various implementations include processing the audio data and/or the separated audio data using an automatic speech recognition model to generate a text representation of the utterance. Additionally or alternatively, the text representation of the utterance can be processed to determine whether at least a portion of the text representation of the utterance captures a particular keyphrase. When the system determines the registered and/or verified user spoke the utterance and the system determines the text representation of the utterance captures the particular keyphrase, the system can cause a computing device to perform one or more actions corresponding to the particular keyphrase.

6.

发明申请
VOICE SHORTCUT DETECTION WITH SPEAKER VERIFICATION 有权

公开(公告)号：US20220335953A1

公开(公告)日：2022-10-20

申请号：US17233253

申请日：2021-04-16

Applicant: Google LLC

Inventor： Rajeev Rikhye , Quan Wang , Yanzhang He , Qiao Liang , Ian C. McGraw

IPC: G10L17/24 , G10L21/028 , G10L17/06

Abstract: Techniques disclosed herein are directed towards streaming keyphrase detection which can be customized to detect one or more particular keyphrases, without requiring retraining of any model(s) for those particular keyphrase(s). Many implementations include processing audio data using a speaker separation model to generate separated audio data which isolates an utterance spoken by a human speaker from one or more additional sounds not spoken by the human speaker, and processing the separated audio data using a text independent speaker identification model to determine whether a verified and/or registered user spoke a spoken utterance captured in the audio data. Various implementations include processing the audio data and/or the separated audio data using an automatic speech recognition model to generate a text representation of the utterance. Additionally or alternatively, the text representation of the utterance can be processed to determine whether at least a portion of the text representation of the utterance captures a particular keyphrase. When the system determines the registered and/or verified user spoke the utterance and the system determines the text representation of the utterance captures the particular keyphrase, the system can cause a computing device to perform one or more actions corresponding to the particular keyphrase.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification