Patent search ap:("Google LLC") AND inv:"Arun Narayanan" Page 1

1.

发明授权
Two-pass end to end speech recognition 有权

公开(公告)号：US12073824B2

公开(公告)日：2024-08-27

申请号：US17616135

申请日：2020-12-03

Applicant: GOOGLE LLC

Inventor： Tara N. Sainath , Yanzhang He , Bo Li , Arun Narayanan , Ruoming Pang , Antoine Jean Bruguier , Shuo-Yiin Chang , Wei Li

IPC: G10L15/00 , G06N3/08 , G10L15/05 , G10L15/06 , G10L15/16 , G10L15/22

CPC classification number: G10L15/16 , G06N3/08 , G10L15/05 , G10L15/063 , G10L15/22 , G10L2015/0635

Abstract: Two-pass automatic speech recognition (ASR) models can be used to perform streaming on-device ASR to generate a text representation of an utterance captured in audio data. Various implementations include a first-pass portion of the ASR model used to generate streaming candidate recognition(s) of an utterance captured in audio data. For example, the first-pass portion can include a recurrent neural network transformer (RNN-T) decoder. Various implementations include a second-pass portion of the ASR model used to revise the streaming candidate recognition(s) of the utterance and generate a text representation of the utterance. For example, the second-pass portion can include a listen attend spell (LAS) decoder. Various implementations include a shared encoder shared between the RNN-T decoder and the LAS decoder.

2.

发明公开
EFFICIENT STREAMING NON-RECURRENT ON-DEVICE END-TO-END MODEL 审中-公开

公开(公告)号：US20230343328A1

公开(公告)日：2023-10-26

申请号：US18336211

申请日：2023-06-16

Applicant: Google LLC

Inventor： Tara Sainath , Arun Narayanan , Rami Botros , Yanzhang He , Ehsan Variani , Cyril Allauzen , David Rybach , Ruoming Pang , Trevor Strohman

IPC: G10L15/06 , G10L15/02 , G10L15/22 , G10L15/30

CPC classification number: G10L15/063 , G10L15/02 , G10L15/22 , G10L15/30

Abstract: An ASR model includes a first encoder configured to receive a sequence of acoustic frames and generate a first higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The ASR model also includes a second encoder configured to receive the first higher order feature representation generated by the first encoder at each of the plurality of output steps and generate a second higher order feature representation for a corresponding first higher order feature frame. The ASR model also includes a decoder configured to receive the second higher order feature representation generated by the second encoder at each of the plurality of output steps and generate a first probability distribution over possible speech recognition hypothesis. The ASR model also includes a language model configured to receive the first probability distribution over possible speech hypothesis and generate a rescored probability distribution.

3.

发明公开
Optimizing Personal VAD for On-Device Speech Recognition 审中-公开

公开(公告)号：US20230298591A1

公开(公告)日：2023-09-21

申请号：US18123060

申请日：2023-03-17

Applicant: Google LLC

Inventor： Shaojin Ding , Rajeev Rikhye , Qiao Liang , Yanzhang He , Quan Wang , Arun Narayanan , Tom O'Malley , Ian McGraw

IPC: G10L17/06 , G10L17/22

CPC classification number: G10L17/06 , G10L17/22

Abstract: A computer-implemented method includes receiving a sequence of acoustic frames corresponding to an utterance and generating a reference speaker embedding for the utterance. The method also includes receiving a target speaker embedding for a target speaker and generating feature-wise linear modulation (FiLM) parameters including a scaling vector and a shifting vector based on the target speaker embedding. The method also includes generating an affine transformation output that scales and shifts the reference speaker embedding based on the FiLM parameters. The method also includes generating a classification output indicating whether the utterance was spoken by the target speaker based on the affine transformation output.

4.

发明授权
Automated calling system 有权

公开(公告)号：US11741966B2

公开(公告)日：2023-08-29

申请号：US17964141

申请日：2022-10-12

Applicant: GOOGLE LLC

Inventor： Asaf Aharoni , Arun Narayanan , Nir Shabat , Parisa Haghani , Galen Tsai Chuang , Yaniv Leviathan , Neeraj Gaur , Pedro J. Moreno Mengibar , Rohit Prakash Prabhavalkar , Zhongdi Qu , Austin Severn Waters , Tomer Amiaz , Michiel A. U. Bacchiani

IPC: G10L15/26 , H04M3/428 , H04M1/663 , G10L15/32 , H04M3/51 , H04M1/02

CPC classification number: G10L15/26 , G10L15/32 , H04M1/02 , H04M1/663 , H04M3/4286 , H04M3/5191

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for an automated calling system are disclosed. In one aspect, a method includes the actions of receiving audio data of an utterance spoken by a user who is having a telephone conversation with a bot. The actions further include determining a context of the telephone conversation. The actions further include determining a user intent of a first previous portion of the telephone conversation spoken by the user and a bot intent of a second previous portion of the telephone conversation outputted by a speech synthesizer of the bot. The actions further include, based on the audio data of the utterance, the context of the telephone conversation, the user intent, and the bot intent, generating synthesized speech of a reply by the bot to the utterance. The actions further include, providing, for output, the synthesized speech.

5.

发明申请
STFT-Based Echo Muter 有权

公开(公告)号：US20230079828A1

公开(公告)日：2023-03-16

申请号：US17643825

申请日：2021-12-11

Applicant: Google LLC

Inventor： Turaj Zakizadeh Shabestary , Arun Narayanan

IPC: G10L21/0224 , G10L21/0232

Abstract: A method for Short-Time Fourier Transform-based echo muting includes receiving a microphone signal including acoustic echo captured by a microphone and corresponding to audio content from an acoustic speaker, and receiving a reference signal including a sequence of frames representing the audio content. For each frame in a sequence of frames, the method includes processing, using an acoustic echo canceler configured to receive a respective frame as input to generate a respective output signal frame that cancels the acoustic echo from the respective frame, and determining, using a Double-talk Detector (DTD), based on the respective frame and the respective output signal frame, whether the respective frame includes a double-talk frame or an echo-only frame. For each respective frame that includes the echo-only frame, muting the respective output signal frame, and performing speech processing on the respective output signal frame for each respective frame that includes the double-talk frame.

6.

发明申请
Joint Acoustic Echo Cancelation, Speech Enhancement, and Voice Separation for Automatic Speech Recognition 有权

公开(公告)号：US20230038982A1

公开(公告)日：2023-02-09

申请号：US17644108

申请日：2021-12-14

Applicant: Google LLC

Inventor： Arun Narayanan , Tom O'malley , Quan Wang , Alex Park , James Walker , Nathan David Howard , Yanzhang He , Chung-Cheng Chiu

IPC: G10L21/0216 , G10L15/06 , H04R3/04 , G06N3/04

Abstract: A method for automatic speech recognition using joint acoustic echo cancellation, speech enhancement, and voice separation includes receiving, at a contextual frontend processing model, input speech features corresponding to a target utterance. The method also includes receiving, at the contextual frontend processing model, at least one of a reference audio signal, a contextual noise signal including noise prior to the target utterance, or a speaker embedding including voice characteristics of a target speaker that spoke the target utterance. The method further includes processing, using the contextual frontend processing model, the input speech features and the at least one of the reference audio signal, the contextual noise signal, or the speaker embedding vector to generate enhanced speech features.

7.

发明申请
Cascaded Encoders for Simplified Streaming and Non-Streaming ASR 有权

公开(公告)号：US20220122622A1

公开(公告)日：2022-04-21

申请号：US17237021

申请日：2021-04-21

Applicant: Google LLC

Inventor： Arun Narayanan , Tara Sainath , Chung-Cheng Chiu , Ruoming Pang , Rohit Prabhavalkar , Jiahui Yu , Ehsan Variani , Trevor Strohman

IPC: G10L19/16 , G10L15/00 , G10L25/30 , G06N3/08

Abstract: An automated speech recognition (ASR) model includes a first encoder, a second encoder, and a decoder. The first encoder receives, as input, a sequence of acoustic frames, and generates, at each of a plurality of output steps, a first higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The second encoder receives, as input, the first higher order feature representation generated by the first encoder at each of the plurality of output steps, and generates, at each of the plurality of output steps, a second higher order feature representation for a corresponding first higher order feature frame. The decoder receives, as input, the second higher order feature representation generated by the second encoder at each of the plurality of output steps, and generates, at each of the plurality of time steps, a first probability distribution over possible speech recognition hypotheses.

8.

发明授权
Transducer-based streaming deliberation for cascaded encoders 有权

公开(公告)号：US12118988B2

公开(公告)日：2024-10-15

申请号：US17933307

申请日：2022-09-19

Applicant: Google LLC

Inventor： Ke Hu , Tara N. Sainath , Arun Narayanan , Ruoming Pang , Trevor Strohman

IPC: G10L15/197 , G06F40/126 , G10L15/02 , G10L15/06 , G10L15/08 , G10L15/22

CPC classification number: G10L15/197 , G06F40/126 , G10L15/02 , G10L15/063 , G10L15/083 , G10L15/22

Abstract: A method includes receiving a sequence of acoustic frames and generating, by a first encoder, a first higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The method also includes generating, by a first pass transducer decoder, a first pass speech recognition hypothesis for a corresponding first higher order feature representation and generating, by a text encoder, a text encoding for a corresponding first pass speech recognition hypothesis. The method also includes generating, by a second encoder, a second higher order feature representation for a corresponding first higher order feature representation. The method also includes generating, by a second pass transducer decoder, a second pass speech recognition hypothesis using a corresponding second higher order feature representation and a corresponding text encoding.

9.

发明授权
Enhanced multi-channel acoustic models 有权

公开(公告)号：US11783849B2

公开(公告)日：2023-10-10

申请号：US17303822

申请日：2021-06-08

Applicant: Google LLC

Inventor： Ehsan Variani , Kevin William Wilson , Ron J. Weiss , Tara N. Sainath , Arun Narayanan

IPC: G10L15/16 , G10L25/30 , G10L21/028 , G10L21/0388 , G10L19/008 , G10L15/20 , G10L21/0208 , G10L21/0216

CPC classification number: G10L25/30 , G10L15/16 , G10L15/20 , G10L19/008 , G10L21/028 , G10L21/0388 , G10L2021/02087 , G10L2021/02166

Abstract: This specification describes computer-implemented methods and systems. One method includes receiving, by a neural network of a speech recognition system, first data representing a first raw audio signal and second data representing a second raw audio signal. The first raw audio signal and the second raw audio signal describe audio occurring at a same period of time. The method further includes generating, by a spatial filtering layer of the neural network, a spatial filtered output using the first data and the second data, and generating, by a spectral filtering layer of the neural network, a spectral filtered output using the spatial filtered output. Generating the spectral filtered output comprises processing frequency-domain data representing the spatial filtered output. The method still further includes processing, by one or more additional layers of the neural network, the spectral filtered output to predict sub-word units encoded in both the first raw audio signal and the second raw audio signal.

10.

发明授权
Adaptive multichannel dereverberation for automatic speech recognition 有权

公开(公告)号：US11699453B2

公开(公告)日：2023-07-11

申请号：US17005823

申请日：2020-08-28

Applicant: Google LLC

Inventor： Joseph Caroselli , Arun Narayanan , Izhak Shafran , Richard Rose

IPC: G10L21/00 , G10L21/0208 , G10L15/20 , G10L15/22 , G10L15/065 , G06F3/16 , G06N3/02 , G06F17/14 , G10L15/06 , G10L21/0216

CPC classification number: G10L21/0208 , G06F3/167 , G06F17/142 , G06N3/02 , G10L15/063 , G10L15/065 , G10L15/20 , G10L15/22 , G10L2015/223 , G10L2021/02082 , G10L2021/02166

Abstract: Utilizing an adaptive multichannel technique to mitigate reverberation present in received audio signals, prior to providing corresponding audio data to one or more additional component(s), such as automatic speech recognition (ASR) components. Implementations disclosed herein are “adaptive”, in that they utilize a filter, in the reverberation mitigation, that is online, causal and varies depending on characteristics of the input. Implementations disclosed herein are “multichannel”, in that a corresponding audio signal is received from each of multiple audio transducers (also referred to herein as “microphones”) of a client device, and the multiple audio signals (e.g., frequency domain representations thereof) are utilized in updating of the filter—and dereverberation occurs for audio data corresponding to each of the audio signals (e.g., frequency domain representations thereof) prior to the audio data being provided to ASR component(s) and/or other component(s).

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification