-
公开(公告)号:US20210125601A1
公开(公告)日:2021-04-29
申请号:US17143140
申请日:2021-01-06
Applicant: Google LLC
Inventor: Georg Heigold , Erik Mcdermott , Vincent O. Vanhoucke , Andrew W. Senior , Michiel A.U. Bacchiani
IPC: G10L15/06 , G10L15/16 , G10L15/183 , G06N3/04
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for obtaining, by a first sequence-training speech model, a first batch of training frames that represent speech features of first training utterances; obtaining, by the first sequence-training speech model, one or more first neural network parameters; determining, by the first sequence-training speech model, one or more optimized first neural network parameters based on (i) the first batch of training frames and (ii) the one or more first neural network parameters; obtaining, by a second sequence-training speech model, a second batch of training frames that represent speech features of second training utterances; obtaining one or more second neural network parameters; and determining, by the second sequence-training speech model, one or more optimized second neural network parameters based on (i) the second batch of training frames and (ii) the one or more second neural network parameters.
-
公开(公告)号:US20200118549A1
公开(公告)日:2020-04-16
申请号:US16573323
申请日:2019-09-17
Applicant: Google LLC
Inventor: Georg Heigold , Erik McDermott , Vincent O. Vanhoucke , Andrew W. Senior , Michiel A.U. Bacchiani
IPC: G10L15/06 , G06N3/04 , G10L15/183 , G10L15/16
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for obtaining, by a first sequence-training speech model, a first batch of training frames that represent speech features of first training utterances; obtaining, by the first sequence-training speech model, one or more first neural network parameters; determining, by the first sequence-training speech model, one or more optimized first neural network parameters based on (i) the first batch of training frames and (ii) the one or more first neural network parameters; obtaining, by a second sequence-training speech model, a second batch of training frames that represent speech features of second training utterances; obtaining one or more second neural network parameters; and determining, by the second sequence-training speech model, one or more optimized second neural network parameters based on (i) the second batch of training frames and (ii) the one or more second neural network parameters.
-
公开(公告)号:US20210090570A1
公开(公告)日:2021-03-25
申请号:US16580726
申请日:2019-09-24
Applicant: Google LLC
Inventor: Asaf Aharoni , Arun Narayanan , Nir Shabat , Parisa Haghani , Galen Tsai Chuang , Yaniv Leviathan , Neeraj Gaur , Pedro J. Moreno Mengibar , Rohit Prakash Prabhavalkar , Zhongdi Qu , Austin Severn Waters , Tomer Amiaz , Michiel A.U. Bacchiani
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for an automated calling system are disclosed. In one aspect, a method includes the actions of receiving audio data of an utterance spoken by a user who is having a telephone conversation with a bot. The actions further include determining a context of the telephone conversation. The actions further include determining a user intent of a first previous portion of the telephone conversation spoken by the user and a bot intent of a second previous portion of the telephone conversation outputted by a speech synthesizer of the bot. The actions further include, based on the audio data of the utterance, the context of the telephone conversation, the user intent, and the bot intent, generating synthesized speech of a reply by the bot to the utterance. The actions further include, providing, for output, the synthesized speech.
-
公开(公告)号:US20230038343A1
公开(公告)日:2023-02-09
申请号:US17964141
申请日:2022-10-12
Applicant: GOOGLE LLC
Inventor: Asaf Aharoni , Arun Narayanan , Nir Shabat , Parisa Haghani , Galen Tsai Chuang , Yaniv Leviathan , Neeraj Gaur , Pedro J. Moreno Mengibar , Rohit Prakash Prabhavalkar , Zhongdi Qu , Austin Severn Waters , Tomer Amiaz , Michiel A.U. Bacchiani
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for an automated calling system are disclosed. In one aspect, a method includes the actions of receiving audio data of an utterance spoken by a user who is having a telephone conversation with a bot. The actions further include determining a context of the telephone conversation. The actions further include determining a user intent of a first previous portion of the telephone conversation spoken by the user and a bot intent of a second previous portion of the telephone conversation outputted by a speech synthesizer of the bot. The actions further include, based on the audio data of the utterance, the context of the telephone conversation, the user intent, and the bot intent, generating synthesized speech of a reply by the bot to the utterance. The actions further include, providing, for output, the synthesized speech.
-
公开(公告)号:US11495233B2
公开(公告)日:2022-11-08
申请号:US17505913
申请日:2021-10-20
Applicant: GOOGLE LLC
Inventor: Asaf Aharoni , Arun Narayanan , Nir Shabat , Parisa Haghani , Galen Tsai Chuang , Yaniv Leviathan , Neeraj Gaur , Pedro J. Moreno Mengibar , Rohit Prakash Prabhavalkar , Zhongdi Qu , Austin Severn Waters , Tomer Amiaz , Michiel A.U. Bacchiani
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for an automated calling system are disclosed. In one aspect, a method includes the actions of receiving audio data of an utterance spoken by a user who is having a telephone conversation with a bot. The actions further include determining a context of the telephone conversation. The actions further include determining a user intent of a first previous portion of the telephone conversation spoken by the user and a bot intent of a second previous portion of the telephone conversation outputted by a speech synthesizer of the bot. The actions further include, based on the audio data of the utterance, the context of the telephone conversation, the user intent, and the bot intent, generating synthesized speech of a reply by the bot to the utterance. The actions further include, providing, for output, the synthesized speech.
-
公开(公告)号:US20220238112A1
公开(公告)日:2022-07-28
申请号:US17722960
申请日:2022-04-18
Applicant: Google LLC
Inventor: Chanwoo Kim , Rajeev Conrad Nongpiur , Michiel A.U. Bacchiani
Abstract: Systems and methods are described for improving endpoint detection of a voice query submitted by a user. In some implementations, a synchronized video data and audio data is received. A sequence of frames of the video data that includes images corresponding to lip movement on a face is determined. The audio data is endpointed based on first audio data that corresponds to a first frame of the sequence of frames and second audio data that corresponds to a last frame of the sequence of frames. A transcription of the endpointed audio data is generated by an automated speech recognizer. The generated transcription is then provided for output.
-
公开(公告)号:US20200027444A1
公开(公告)日:2020-01-23
申请号:US16516390
申请日:2019-07-19
Applicant: Google LLC
Inventor: Rohit Prakash Prabhavalkar , Zhifeng Chen , Bo Li , Chung-Cheng Chiu , Kanury Kanishka Rao , Yonghui Wu , Ron J. Weiss , Navdeep Jaitly , Michiel A.U. Bacchiani , Tara N. Sainath , Jan Kazimierz Chorowski , Anjuli Patricia Kannan , Ekaterina Gonina , Patrick An Phu Nguyen
Abstract: Methods, systems, and apparatus, including computer-readable media, for performing speech recognition using sequence-to-sequence models. An automated speech recognition (ASR) system receives audio data for an utterance and provides features indicative of acoustic characteristics of the utterance as input to an encoder. The system processes an output of the encoder using an attender to generate a context vector and generates speech recognition scores using the context vector and a decoder trained using a training process that selects at least one input to the decoder with a predetermined probability. An input to the decoder during training is selected between input data based on a known value for an element in a training example, and input data based on an output of the decoder for the element in the training example. A transcription is generated for the utterance using word elements selected based on the speech recognition scores. The transcription is provided as an output of the ASR system.
-
公开(公告)号:US20180197534A1
公开(公告)日:2018-07-12
申请号:US15848829
申请日:2017-12-20
Applicant: Google LLC
Inventor: Bo Li , Ron J. Weiss , Michiel A.U. Bacchiani , Tara N. Sainath , Kevin William Wilson
IPC: G10L15/16 , G10L21/0224 , G10L15/26 , G10L21/0216
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for neural network adaptive beamforming for multichannel speech recognition are disclosed. In one aspect, a method includes the actions of receiving a first channel of audio data corresponding to an utterance and a second channel of audio data corresponding to the utterance. The actions further include generating a first set of filter parameters for a first filter based on the first channel of audio data and the second channel of audio data and a second set of filter parameters for a second filter based on the first channel of audio data and the second channel of audio data. The actions further include generating a single combined channel of audio data. The actions further include inputting the audio data to a neural network. The actions further include providing a transcription for the utterance.
-
公开(公告)号:US20240265923A1
公开(公告)日:2024-08-08
申请号:US18635974
申请日:2024-04-15
Applicant: GOOGLE LLC
Inventor: Asaf Aharoni , Arun Narayanan , Nir Shabat , Parisa Haghani , Galen Tsai Chuang , Yaniv Leviathan , Neeraj Gaur , Pedro J. Moreno Mengibar , Rohit Prakash Prabhavalkar , Zhongdi Qu , Austin Severn Waters , Tomer Amiaz , Michiel A.U. Bacchiani
CPC classification number: G10L15/26 , G10L15/32 , H04M1/02 , H04M1/663 , H04M3/4286 , H04M3/5191
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for an automated calling system are disclosed. In one aspect, a method includes the actions of receiving audio data of an utterance spoken by a user who is having a telephone conversation with a bot. The actions further include determining a context of the telephone conversation. The actions further include determining a user intent of a first previous portion of the telephone conversation spoken by the user and a bot intent of a second previous portion of the telephone conversation outputted by a speech synthesizer of the bot. The actions further include, based on the audio data of the utterance, the context of the telephone conversation, the user intent, and the bot intent, generating synthesized speech of a reply by the bot to the utterance. The actions further include, providing, for output, the synthesized speech.
-
公开(公告)号:US20230352027A1
公开(公告)日:2023-11-02
申请号:US18219480
申请日:2023-07-07
Applicant: GOOGLE LLC
Inventor: Asaf Aharoni , Arun Narayanan , Nir Shabat , Parisa Haghani , Galen Tsai Chuang , Yaniv Leviathan , Neeraj Gaur , Pedro J. Moreno Mengibar , Rohit Prakash Prabhavalkar , Zhongdi Qu , Austin Severn Waters , Tomer Amiaz , Michiel A.U. Bacchiani
CPC classification number: G10L15/26 , H04M3/4286 , H04M1/663 , G10L15/32 , H04M3/5191 , H04M1/02
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for an automated calling system are disclosed. In one aspect, a method includes the actions of receiving audio data of an utterance spoken by a user who is having a telephone conversation with a bot. The actions further include determining a context of the telephone conversation. The actions further include determining a user intent of a first previous portion of the telephone conversation spoken by the user and a bot intent of a second previous portion of the telephone conversation outputted by a speech synthesizer of the bot. The actions further include, based on the audio data of the utterance, the context of the telephone conversation, the user intent, and the bot intent, generating synthesized speech of a reply by the bot to the utterance. The actions further include, providing, for output, the synthesized speech.
-
-
-
-
-
-
-
-
-