-
公开(公告)号:US20230317080A1
公开(公告)日:2023-10-05
申请号:US18081201
申请日:2022-12-14
Applicant: Hyundai Motor Company , Kia Corporation
Inventor: In Jik Lee
CPC classification number: G10L15/222 , G10L15/1815 , G10L15/28 , G10L13/02 , G10L2015/228
Abstract: A dialogue system includes a voice recognition module provided to execute voice recognition, a storage device in which a result of the executed voice recognition is stored, and a controller configured to determine priorities of an external event and the voice recognition when the external event occurs while an utterance of a user is input, and pause the execution of the voice recognition and store a result of the voice recognition of the utterance of the user inputted before the pause in the storage device, when the controller concludes that the priority of the external event is higher than the priority of the voice recognition.
-
公开(公告)号:US11763813B2
公开(公告)日:2023-09-19
申请号:US17243232
申请日:2021-04-28
Applicant: Google LLC
Inventor: Lior Alon , Rafael Goldfarb , Dekel Auster , Dan Rasin , Michael Andrew Goodman , Trevor Strohman , Nino Tasca , Valerie Nygaard , Jaclyn Konzelmann
CPC classification number: G10L15/22 , G06F3/167 , G10L15/083 , G10L15/1815 , G10L15/285 , G10L2015/223
Abstract: Implementations described herein relate to reducing latency in automated assistant interactions. In some implementations, a client device can receive audio data that captures a spoken utterance of a user. The audio data can be processed to determine an assistant command to be performed by an automated assistant. The assistant command can be processed, using a latency prediction model, to generate a predicted latency to fulfill the assistant command. Further, the client device (or the automated assistant) can determine, based on the predicted latency, whether to audibly render pre-cached content for presentation to the user prior to audibly rendering content that is responsive to the spoken utterance. The pre-cached content can be tailored to the assistant command and audibly rendered for presentation to the user while the content is being obtained, and the content can be audibly rendered for presentation to the user subsequent to the pre-cached content.
-
公开(公告)号:US11735166B2
公开(公告)日:2023-08-22
申请号:US17361408
申请日:2021-06-29
Applicant: Tata Consultancy Services Limited
Inventor: Swarnava Dey , Jeet Dutta
IPC: G10L15/06 , G06N3/088 , G10L15/04 , G10L15/16 , G10L15/22 , G10L15/28 , G10L25/78 , G06N3/044 , G06N3/045 , G10L15/05
CPC classification number: G10L15/063 , G06N3/044 , G06N3/045 , G06N3/088 , G10L15/04 , G10L15/05 , G10L15/16 , G10L15/22 , G10L15/28 , G10L25/78 , G10L15/06
Abstract: Automatic speech recognition techniques are implemented in resource constrained devices such as edge devices in internet of things where on-device speech recognition is required for low latency and privacy preservation. Existing neural network models for speech recognition have a large size and are not suitable for deployment in such devices. The present disclosure provides an architecture of a size constrained neural network and a method of training the size constrained neural network. The architecture of the size constrained neural network provides a way of increasing or decreasing number of feature blocks to achieve an accuracy-model size trade off. The method of training the size constrained neural network comprises creating a training dataset with short utterances and training the size constrained neural network with the training dataset to learn short term dependencies in the utterances. The trained size constrained neural network model is suitable for deployment in resource constrained devices.
-
公开(公告)号:US20230223027A1
公开(公告)日:2023-07-13
申请号:US18184783
申请日:2023-03-16
Applicant: Comcast Cable Communications, LLC
Inventor: Rui MIN , Hongcheng WANG
IPC: G06F16/632 , G06F16/635 , G10L15/28
CPC classification number: G06F16/634 , G06F16/636 , G10L15/285 , G10L15/32
Abstract: A device may be configured to determine whether an audio file is a first type of audio file that is capable of being processed to recognize the voice query based on a characteristic of the audio file itself or a second type of audio file that may require speech recognition processing in order to recognize the voice query associated with the audio file. In determining whether the audio file is a first type of audio file or a second type of audio file, a query filter associated with the device may be configured to access one or more guidance queries. Using the one or more guidance queries, the device may classify the audio file as a first type of audio file or a second type of audio file based on receiving only a portion of the audio file, thereby improving the speed at which the audio file can be processed.
-
25.
公开(公告)号:US20230169281A1
公开(公告)日:2023-06-01
申请号:US17533687
申请日:2021-11-23
Applicant: Baidu USA, LLC
Inventor: Renjie ZHENG , Junkun CHEN , Mingbo MA , Liang HUANG
CPC classification number: G06F40/58 , G10L15/28 , G10L15/063
Abstract: Representation learning for text and speech has improved many language-related tasks. However, existing methods only learn from one input modality, while a unified representation for both speech and text is needed for tasks such as end-to-end speech translation. Consequently, these methods cannot exploit various large-scale text and speech data and their performance is limited by the scarcity of parallel speech translation data. To address these problems, embodiments of a fused acoustic and text masked language model (FAT-MLM) are disclosed. FAT-MLM embodiments jointly learn a unified representation for both acoustic and text input from various types of corpora including parallel data for speech recognition and machine translation, and pure speech and text data. Within this cross-modal representation learning framework, an end-to-end model is further presented for fused acoustic and text speech translation. Experiments show that by fine-tuning from FAT-MLM, the speech translation model embodiments substantially improve translation quality.
-
公开(公告)号:US11659323B2
公开(公告)日:2023-05-23
申请号:US17453633
申请日:2021-11-04
Applicant: Sonos, Inc.
Inventor: Eric Frank
CPC classification number: H04R1/406 , G01S5/18 , G10L15/22 , G10L15/28 , H04R27/00 , H04R3/005 , H04R5/04 , H04R2205/021 , H04R2227/003 , H04R2227/005 , H04R2410/00 , H04R2430/23
Abstract: Systems and methods are disclosed in which a playback device transmits a first sound signal including a predetermined waveform. In one example, the playback device receives a second sound signal including at least one reflection of the first sound signal. The second sound signal is processed to determine a location of a person relative to the playback device, and a characteristic of audio reproduction by the playback device is selected, based on the determined location of the person.
-
27.
公开(公告)号:US20230144884A1
公开(公告)日:2023-05-11
申请号:US17533424
申请日:2021-11-23
Applicant: GOOGLE LLC
Inventor: Victor Carbune , Matthew Sharifi
CPC classification number: G10L15/22 , G10L15/18 , G10L15/28 , G06F16/63 , G10L2015/228
Abstract: Systems and methods for providing audio data, from an initially invoked automated assistant to a subsequently invoked automated assistant. An initially invoked automated assistant may be invoked by a user utterance, followed by audio data that includes a query. The query is provided to a secondary automated assistant for processing. Subsequently, the user can submit a query that is related to the first query. In response, the initially invoked automated assistant provides the query to the secondary automated assistant in lieu of providing the query to other secondary automated assistants based on similarity between the first query and the subsequent query.
-
公开(公告)号:US11609947B2
公开(公告)日:2023-03-21
申请号:US16659262
申请日:2019-10-21
Applicant: Comcast Cable Communications, LLC
Inventor: Rui Min , Hongcheng Wang
IPC: G06F16/65 , G10L25/54 , G10L15/28 , G06F16/632 , G06K9/62
Abstract: A device may be configured to determine whether an audio file is a first type of audio file that is capable of being processed to recognize the voice query based on a characteristic of the audio file itself or a second type of audio file that may require speech recognition processing in order to recognize the voice query associated with the audio file. In determining whether the audio file is a first type of audio file or a second type of audio file, a query filter associated with the device may be configured to access one or more guidance queries. Using the one or more guidance queries, the device may classify the audio file as a first type of audio file or a second type of audio file based on receiving only a portion of the audio file, thereby improving the speed at which the audio file can be processed.
-
公开(公告)号:US11587553B2
公开(公告)日:2023-02-21
申请号:US16968126
申请日:2019-02-07
Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
Inventor: Takashi Nakamura , Takaaki Fukutomi
Abstract: Provided is technology for assessing whether uttered speech detected from input speech is speech suited to a prescribed purpose. A method comprises detecting, from input speech including speech uttered by a speaker and noise, the uttered speech corresponding to the speech uttered by the speaker, extracting an acoustic feature of the uttered speech, generating, from the uttered speech, a speech recognition result set with a recognition score, generating, from the speech recognition result set with the recognition score, a speech recognition result word vector expression set and a speech recognition result part-of-speech vector expression set, generating a target utterance estimation model, providing, using the target utterance estimation model, a probability of the uttered speech being suited to the prescribed purpose, and outputting the uttered speech and the speech recognition result set with the recognition score, the the uttered speech suitable to the prescribed purpose.
-
公开(公告)号:US11557299B2
公开(公告)日:2023-01-17
申请号:US17137157
申请日:2020-12-29
Applicant: Google LLC
Inventor: Matthew Sharifi
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for hotword detection on multiple devices are disclosed. In one aspect, a method includes the actions of receiving, by a first computing device, audio data that corresponds to an utterance. The actions further include determining a first value corresponding to a likelihood that the utterance includes a hotword. The actions further include receiving a second value corresponding to a likelihood that the utterance includes the hotword, the second value being determined by a second computing device. The actions further include comparing the first value and the second value. The actions further include based on comparing the first value to the second value, initiating speech recognition processing on the audio data.
-
-
-
-
-
-
-
-
-