专利检索 ap:("Google LLC") AND inv:"Michiel A. U. Bacchiani" 第 1 页

1.

发明授权
Asynchronous optimization for sequence training of neural networks 有权

公开(公告)号：US10482873B2

公开(公告)日：2019-11-19

申请号：US15910720

申请日：2018-03-02

申请人： Google LLC

发明人： Georg Heigold , Erik McDermott , Vincent O. Vanhoucke , Andrew W. Senior , Michiel A. U. Bacchiani

IPC分类号： G10L15/06 , G10L15/16 , G10L15/183 , G06N3/04

摘要： Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for obtaining, by a first sequence-training speech model, a first batch of training frames that represent speech features of first training utterances; obtaining, by the first sequence-training speech model, one or more first neural network parameters; determining, by the first sequence-training speech model, one or more optimized first neural network parameters based on (i) the first batch of training frames and (ii) the one or more first neural network parameters; obtaining, by a second sequence-training speech model, a second batch of training frames that represent speech features of second training utterances; obtaining one or more second neural network parameters; and determining, by the second sequence-training speech model, one or more optimized second neural network parameters based on (i) the second batch of training frames and (ii) the one or more second neural network parameters.

2.

发明授权
Complex linear projection for acoustic modeling 有权

公开(公告)号：US10140980B2

公开(公告)日：2018-11-27

申请号：US15386979

申请日：2016-12-21

申请人： Google LLC

发明人： Samuel Bengio , Mirko Visontai , Christopher Walter George Thornton , Michiel A. U. Bacchiani , Tara N. Sainath , Ehsan Variani , Izhak Shafran

IPC分类号： G10L15/16 , G10L19/02 , G10L15/02

摘要： Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech recognition using complex linear projection are disclosed. In one aspect, a method includes the actions of receiving audio data corresponding to an utterance. The method further includes generating frequency domain data using the audio data. The method further includes processing the frequency domain data using complex linear projection. The method further includes providing the processed frequency domain data to a neural network trained as an acoustic model. The method further includes generating a transcription for the utterance that is determined based at least on output that the neural network provides in response to receiving the processed frequency domain data.

3.

发明授权
Adaptive audio enhancement for multichannel speech recognition 有权

公开(公告)号：US11756534B2

公开(公告)日：2023-09-12

申请号：US17649058

申请日：2022-01-26

申请人： Google LLC

发明人： Bo Li , Ron Weiss , Michiel A. U. Bacchiani , Tara N. Sainath , Kevin William Wilson

IPC分类号： G10L15/00 , G10L15/16 , G10L15/20 , G10L21/0224 , G10L15/26 , G10L21/0216

CPC分类号： G10L15/16 , G10L15/20 , G10L21/0224 , G10L15/26 , G10L2021/02166

摘要： Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for neural network adaptive beamforming for multichannel speech recognition are disclosed. In one aspect, a method includes the actions of receiving a first channel of audio data corresponding to an utterance and a second channel of audio data corresponding to the utterance. The actions further include generating a first set of filter parameters for a first filter based on the first channel of audio data and the second channel of audio data and a second set of filter parameters for a second filter based on the first channel of audio data and the second channel of audio data. The actions further include generating a single combined channel of audio data. The actions further include inputting the audio data to a neural network. The actions further include providing a transcription for the utterance.

4.

发明授权
Asynchronous optimization for sequence training of neural networks 有权

公开(公告)号：US11557277B2

公开(公告)日：2023-01-17

申请号：US17644362

申请日：2021-12-15

申请人： Google LLC

发明人： Georg Heigold , Erik McDermott , Vincent O. VanHoucke , Andrew W. Senior , Michiel A. U. Bacchiani

IPC分类号： G10L15/06 , G10L15/16 , G10L15/183 , G06N3/04

摘要： Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for obtaining, by a first sequence-training speech model, a first batch of training frames that represent speech features of first training utterances; obtaining, by the first sequence-training speech model, one or more first neural network parameters; determining, by the first sequence-training speech model, one or more optimized first neural network parameters based on (i) the first batch of training frames and (ii) the one or more first neural network parameters; obtaining, by a second sequence-training speech model, a second batch of training frames that represent speech features of second training utterances; obtaining one or more second neural network parameters; and determining, by the second sequence-training speech model, one or more optimized second neural network parameters based on (i) the second batch of training frames and (ii) the one or more second neural network parameters.

5.

发明授权
Query endpointing based on lip detection 有权

公开(公告)号：US10755714B2

公开(公告)日：2020-08-25

申请号：US16412677

申请日：2019-05-15

申请人： Google LLC

发明人： Chanwoo Kim , Rajeev Conrad Nongpiur , Michiel A. U. Bacchiani

IPC分类号： G10L15/22 , G06F40/30 , G10L15/04 , G10L25/78 , G10L15/25 , G06K9/00 , G10L15/26

摘要： Systems and methods are described for improving endpoint detection of a voice query submitted by a user. In some implementations, a synchronized video data and audio data is received. A sequence of frames of the video data that includes images corresponding to lip movement on a face is determined. The audio data is endpointed based on first audio data that corresponds to a first frame of the sequence of frames and second audio data that corresponds to a last frame of the sequence of frames. A transcription of the endpointed audio data is generated by an automated speech recognizer. The generated transcription is then provided for output.

6.

发明授权
Asynchronous optimization for sequence training of neural networks 有权

公开(公告)号：US12073823B2

公开(公告)日：2024-08-27

申请号：US18506540

申请日：2023-11-10

申请人： Google LLC

发明人： Georg Heigold , Erik Mcdermott , Vincent O. Vanhoucke , Andrew W. Senior , Michiel A. U. Bacchiani

IPC分类号： G10L15/06 , G06N3/045 , G10L15/16 , G10L15/183

CPC分类号： G10L15/063 , G06N3/045 , G10L15/16 , G10L15/183

摘要： Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for obtaining, by a first sequence-training speech model, a first batch of training frames that represent speech features of first training utterances; obtaining, by the first sequence-training speech model, one or more first neural network parameters; determining, by the first sequence-training speech model, one or more optimized first neural network parameters based on (i) the first batch of training frames and (ii) the one or more first neural network parameters; obtaining, by a second sequence-training speech model, a second batch of training frames that represent speech features of second training utterances; obtaining one or more second neural network parameters; and determining, by the second sequence-training speech model, one or more optimized second neural network parameters based on (i) the second batch of training frames and (ii) the one or more second neural network parameters.

7.

发明授权
Automated calling system 有权

公开(公告)号：US11741966B2

公开(公告)日：2023-08-29

申请号：US17964141

申请日：2022-10-12

申请人： GOOGLE LLC

发明人： Asaf Aharoni , Arun Narayanan , Nir Shabat , Parisa Haghani , Galen Tsai Chuang , Yaniv Leviathan , Neeraj Gaur , Pedro J. Moreno Mengibar , Rohit Prakash Prabhavalkar , Zhongdi Qu , Austin Severn Waters , Tomer Amiaz , Michiel A. U. Bacchiani

IPC分类号： G10L15/26 , H04M3/428 , H04M1/663 , G10L15/32 , H04M3/51 , H04M1/02

CPC分类号： G10L15/26 , G10L15/32 , H04M1/02 , H04M1/663 , H04M3/4286 , H04M3/5191

摘要： Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for an automated calling system are disclosed. In one aspect, a method includes the actions of receiving audio data of an utterance spoken by a user who is having a telephone conversation with a bot. The actions further include determining a context of the telephone conversation. The actions further include determining a user intent of a first previous portion of the telephone conversation spoken by the user and a bot intent of a second previous portion of the telephone conversation outputted by a speech synthesizer of the bot. The actions further include, based on the audio data of the utterance, the context of the telephone conversation, the user intent, and the bot intent, generating synthesized speech of a reply by the bot to the utterance. The actions further include, providing, for output, the synthesized speech.

8.

发明授权
Asynchronous optimization for sequence training of neural networks 有权

公开(公告)号：US11227582B2

公开(公告)日：2022-01-18

申请号：US17143140

申请日：2021-01-06

申请人： Google LLC

发明人： Georg Heigold , Erik Mcdermott , Vincent O. Vanhoucke , Andrew W. Senior , Michiel A. U. Bacchiani

IPC分类号： G10L15/06 , G10L15/16 , G10L15/183 , G06N3/04

摘要： Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for obtaining, by a first sequence-training speech model, a first batch of training frames that represent speech features of first training utterances; obtaining, by the first sequence-training speech model, one or more first neural network parameters; determining, by the first sequence-training speech model, one or more optimized first neural network parameters based on (i) the first batch of training frames and (ii) the one or more first neural network parameters; obtaining, by a second sequence-training speech model, a second batch of training frames that represent speech features of second training utterances; obtaining one or more second neural network parameters; and determining, by the second sequence-training speech model, one or more optimized second neural network parameters based on (i) the second batch of training frames and (ii) the one or more second neural network parameters.

9.

发明授权
Adaptive audio enhancement for multichannel speech recognition 有权

公开(公告)号：US10515626B2

公开(公告)日：2019-12-24

申请号：US15848829

申请日：2017-12-20

申请人： Google LLC

发明人： Bo Li , Ron J. Weiss , Michiel A. U. Bacchiani , Tara N. Sainath , Kevin William Wilson

IPC分类号： G10L15/00 , G10L15/16 , G10L21/0224 , G10L15/20 , G10L15/26 , G10L21/0216

摘要： Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for neural network adaptive beamforming for multichannel speech recognition are disclosed. In one aspect, a method includes the actions of receiving a first channel of audio data corresponding to an utterance and a second channel of audio data corresponding to the utterance. The actions further include generating a first set of filter parameters for a first filter based on the first channel of audio data and the second channel of audio data and a second set of filter parameters for a second filter based on the first channel of audio data and the second channel of audio data. The actions further include generating a single combined channel of audio data. The actions further include inputting the audio data to a neural network. The actions further include providing a transcription for the utterance.

10.

发明授权
Hotword suppression 有权

公开(公告)号：US11967323B2

公开(公告)日：2024-04-23

申请号：US17849253

申请日：2022-06-24

申请人： GOOGLE LLC

发明人： Alexander H. Gruenstein , Taral Pradeep Joglekar , Vijayaditya Peddinti , Michiel A. U. Bacchiani

IPC分类号： G10L15/22 , G10L15/06 , G10L15/08 , G10L15/30 , G10L17/00 , G10L17/22 , G10L25/51

CPC分类号： G10L15/22 , G10L15/063 , G10L15/08 , G10L15/30 , G10L17/00 , G10L17/22 , G10L25/51 , G10L2015/088

摘要： A method includes adding, by a first computing device, a first audio watermark to first speech data corresponding to playback of a first utterance including a hotword used to invoke an attention of a second computing device. The method includes outputting, by the first computing device, the playback of the first utterance corresponding to the watermarked first speech data. The second computing device is configured to receive the watermarked first speech data and determine to cease processing of the watermarked first speech data.

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类