Patent search ipc:"G10L15/28" Page 3

21.

发明公开
DIALOGUE SYSTEM AND CONTROL METHOD THEREOF 审中-公开

公开(公告)号：US20230317080A1

公开(公告)日：2023-10-05

申请号：US18081201

申请日：2022-12-14

Applicant: Hyundai Motor Company , Kia Corporation

Inventor： In Jik Lee

IPC: G10L15/22 , G10L15/18 , G10L15/28 , G10L13/02

CPC classification number: G10L15/222 , G10L15/1815 , G10L15/28 , G10L13/02 , G10L2015/228

Abstract: A dialogue system includes a voice recognition module provided to execute voice recognition, a storage device in which a result of the executed voice recognition is stored, and a controller configured to determine priorities of an external event and the voice recognition when the external event occurs while an utterance of a user is input, and pause the execution of the voice recognition and store a result of the voice recognition of the utterance of the user inputted before the pause in the storage device, when the controller concludes that the priority of the external event is higher than the priority of the voice recognition.

22.

发明授权
Methods and systems for reducing latency in automated assistant interactions 有权

公开(公告)号：US11763813B2

公开(公告)日：2023-09-19

申请号：US17243232

申请日：2021-04-28

Applicant: Google LLC

Inventor： Lior Alon , Rafael Goldfarb , Dekel Auster , Dan Rasin , Michael Andrew Goodman , Trevor Strohman , Nino Tasca , Valerie Nygaard , Jaclyn Konzelmann

IPC: G10L15/22 , G06F3/16 , G10L15/08 , G10L15/18 , G10L15/28

CPC classification number: G10L15/22 , G06F3/167 , G10L15/083 , G10L15/1815 , G10L15/285 , G10L2015/223

Abstract: Implementations described herein relate to reducing latency in automated assistant interactions. In some implementations, a client device can receive audio data that captures a spoken utterance of a user. The audio data can be processed to determine an assistant command to be performed by an automated assistant. The assistant command can be processed, using a latency prediction model, to generate a predicted latency to fulfill the assistant command. Further, the client device (or the automated assistant) can determine, based on the predicted latency, whether to audibly render pre-cached content for presentation to the user prior to audibly rendering content that is responsive to the spoken utterance. The pre-cached content can be tailored to the assistant command and audibly rendered for presentation to the user while the content is being obtained, and the content can be audibly rendered for presentation to the user subsequent to the pre-cached content.

23.

发明授权
Method and system for automatic speech recognition in resource constrained devices 有权

公开(公告)号：US11735166B2

公开(公告)日：2023-08-22

申请号：US17361408

申请日：2021-06-29

Applicant: Tata Consultancy Services Limited

Inventor： Swarnava Dey , Jeet Dutta

IPC: G10L15/06 , G06N3/088 , G10L15/04 , G10L15/16 , G10L15/22 , G10L15/28 , G10L25/78 , G06N3/044 , G06N3/045 , G10L15/05

CPC classification number: G10L15/063 , G06N3/044 , G06N3/045 , G06N3/088 , G10L15/04 , G10L15/05 , G10L15/16 , G10L15/22 , G10L15/28 , G10L25/78 , G10L15/06

Abstract: Automatic speech recognition techniques are implemented in resource constrained devices such as edge devices in internet of things where on-device speech recognition is required for low latency and privacy preservation. Existing neural network models for speech recognition have a large size and are not suitable for deployment in such devices. The present disclosure provides an architecture of a size constrained neural network and a method of training the size constrained neural network. The architecture of the size constrained neural network provides a way of increasing or decreasing number of feature blocks to achieve an accuracy-model size trade off. The method of training the size constrained neural network comprises creating a training dataset with short utterances and training the size constrained neural network with the training dataset to learn short term dependencies in the utterances. The trained size constrained neural network model is suitable for deployment in resource constrained devices.

24.

发明公开
GUIDANCE QUERY FOR CACHE SYSTEM 审中-公开

公开(公告)号：US20230223027A1

公开(公告)日：2023-07-13

申请号：US18184783

申请日：2023-03-16

Applicant: Comcast Cable Communications, LLC

Inventor： Rui MIN , Hongcheng WANG

IPC: G06F16/632 , G06F16/635 , G10L15/28

CPC classification number: G06F16/634 , G06F16/636 , G10L15/285 , G10L15/32

Abstract: A device may be configured to determine whether an audio file is a first type of audio file that is capable of being processed to recognize the voice query based on a characteristic of the audio file itself or a second type of audio file that may require speech recognition processing in order to recognize the voice query associated with the audio file. In determining whether the audio file is a first type of audio file or a second type of audio file, a query filter associated with the device may be configured to access one or more guidance queries. Using the one or more guidance queries, the device may classify the audio file as a first type of audio file or a second type of audio file based on receiving only a portion of the audio file, thereby improving the speed at which the audio file can be processed.

25.

发明公开
FUSED ACOUSTIC AND TEXT ENCODING FOR MULTIMODAL BILINGUAL PRETRAINING AND SPEECH TRANSLATION 审中-公开

公开(公告)号：US20230169281A1

公开(公告)日：2023-06-01

申请号：US17533687

申请日：2021-11-23

Applicant: Baidu USA, LLC

Inventor： Renjie ZHENG , Junkun CHEN , Mingbo MA , Liang HUANG

IPC: G06F40/58 , G10L15/06 , G10L15/28

CPC classification number: G06F40/58 , G10L15/28 , G10L15/063

Abstract: Representation learning for text and speech has improved many language-related tasks. However, existing methods only learn from one input modality, while a unified representation for both speech and text is needed for tasks such as end-to-end speech translation. Consequently, these methods cannot exploit various large-scale text and speech data and their performance is limited by the scarcity of parallel speech translation data. To address these problems, embodiments of a fused acoustic and text masked language model (FAT-MLM) are disclosed. FAT-MLM embodiments jointly learn a unified representation for both acoustic and text input from various types of corpora including parallel data for speech recognition and machine translation, and pure speech and text data. Within this cross-modal representation learning framework, an end-to-end model is further presented for fused acoustic and text speech translation. Experiments show that by fine-tuning from FAT-MLM, the speech translation model embodiments substantially improve translation quality.

26.

发明授权
Systems and methods of user localization 有权

公开(公告)号：US11659323B2

公开(公告)日：2023-05-23

申请号：US17453633

申请日：2021-11-04

Applicant: Sonos, Inc.

Inventor： Eric Frank

IPC: H04R1/40 , G10L15/22 , G10L15/28 , G01S5/18 , H04R27/00 , H04R5/04 , H04R3/00

CPC classification number: H04R1/406 , G01S5/18 , G10L15/22 , G10L15/28 , H04R27/00 , H04R3/005 , H04R5/04 , H04R2205/021 , H04R2227/003 , H04R2227/005 , H04R2410/00 , H04R2430/23

Abstract: Systems and methods are disclosed in which a playback device transmits a first sound signal including a predetermined waveform. In one example, the playback device receives a second sound signal including at least one reflection of the first sound signal. The second sound signal is processed to determine a location of a person relative to the playback device, and a characteristic of audio reproduction by the playback device is selected, based on the determined location of the person.

27.

发明公开
PROVIDING RELATED QUERIES TO A SECONDARY AUTOMATED ASSISTANT BASED ON PAST INTERACTIONS 审中-公开

公开(公告)号：US20230144884A1

公开(公告)日：2023-05-11

申请号：US17533424

申请日：2021-11-23

Applicant: GOOGLE LLC

Inventor： Victor Carbune , Matthew Sharifi

IPC: G10L15/22 , G10L15/18 , G10L15/28 , G06F16/63

CPC classification number: G10L15/22 , G10L15/18 , G10L15/28 , G06F16/63 , G10L2015/228

Abstract: Systems and methods for providing audio data, from an initially invoked automated assistant to a subsequently invoked automated assistant. An initially invoked automated assistant may be invoked by a user utterance, followed by audio data that includes a query. The query is provided to a secondary automated assistant for processing. Subsequently, the user can submit a query that is related to the first query. In response, the initially invoked automated assistant provides the query to the secondary automated assistant in lieu of providing the query to other secondary automated assistants based on similarity between the first query and the subsequent query.

28.

发明授权
Guidance query for cache system 有权

公开(公告)号：US11609947B2

公开(公告)日：2023-03-21

申请号：US16659262

申请日：2019-10-21

Applicant: Comcast Cable Communications, LLC

Inventor： Rui Min , Hongcheng Wang

IPC: G06F16/65 , G10L25/54 , G10L15/28 , G06F16/632 , G06K9/62

Abstract: A device may be configured to determine whether an audio file is a first type of audio file that is capable of being processed to recognize the voice query based on a characteristic of the audio file itself or a second type of audio file that may require speech recognition processing in order to recognize the voice query associated with the audio file. In determining whether the audio file is a first type of audio file or a second type of audio file, a query filter associated with the device may be configured to access one or more guidance queries. Using the one or more guidance queries, the device may classify the audio file as a first type of audio file or a second type of audio file based on receiving only a portion of the audio file, thereby improving the speed at which the audio file can be processed.

29.

发明授权
Appropriate utterance estimate model learning apparatus, appropriate utterance judgement apparatus, appropriate utterance estimate model learning method, appropriate utterance judgement method, and program 有权

公开(公告)号：US11587553B2

公开(公告)日：2023-02-21

申请号：US16968126

申请日：2019-02-07

Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATION

Inventor： Takashi Nakamura , Takaaki Fukutomi

IPC: G10L15/06 , G10L15/05 , G10L15/16 , G10L15/22 , G10L15/28

Abstract: Provided is technology for assessing whether uttered speech detected from input speech is speech suited to a prescribed purpose. A method comprises detecting, from input speech including speech uttered by a speaker and noise, the uttered speech corresponding to the speech uttered by the speaker, extracting an acoustic feature of the uttered speech, generating, from the uttered speech, a speech recognition result set with a recognition score, generating, from the speech recognition result set with the recognition score, a speech recognition result word vector expression set and a speech recognition result part-of-speech vector expression set, generating a target utterance estimation model, providing, using the target utterance estimation model, a probability of the uttered speech being suited to the prescribed purpose, and outputting the uttered speech and the speech recognition result set with the recognition score, the the uttered speech suitable to the prescribed purpose.

30.

发明授权
Hotword detection on multiple devices 有权

公开(公告)号：US11557299B2

公开(公告)日：2023-01-17

申请号：US17137157

申请日：2020-12-29

Applicant: Google LLC

Inventor： Matthew Sharifi

IPC: G10L15/28 , G10L15/22 , G10L15/08 , G10L17/22 , G10L15/32 , G10L15/01 , G06F3/16

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for hotword detection on multiple devices are disclosed. In one aspect, a method includes the actions of receiving, by a first computing device, audio data that corresponds to an utterance. The actions further include determining a first value corresponding to a likelihood that the utterance includes a hotword. The actions further include receiving a second value corresponding to a likelihood that the utterance includes the hotword, the second value being determined by a second computing device. The actions further include comparing the first value and the second value. The actions further include based on comparing the first value to the second value, initiating speech recognition processing on the audio data.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification