-
61.
公开(公告)号:US11978432B2
公开(公告)日:2024-05-07
申请号:US18204324
申请日:2023-05-31
申请人: GOOGLE LLC
IPC分类号: G10L13/047 , G10L15/06
CPC分类号: G10L13/047 , G10L15/063 , G10L2015/0635
摘要: Processor(s) of a client device can: identify a textual segment stored locally at the client device; process the textual segment, using a speech synthesis model stored locally at the client device, to generate synthesized speech audio data that includes synthesized speech of the identified textual segment; process the synthesized speech, using an on-device speech recognition model that is stored locally at the client device, to generate predicted output; and generate a gradient based on comparing the predicted output to ground truth output that corresponds to the textual segment. In some implementations, the generated gradient is used, by processor(s) of the client device, to update weights of the on-device speech recognition model. In some implementations, the generated gradient is additionally or alternatively transmitted to a remote system for use in remote updating of global weights of a global speech recognition model.
-
公开(公告)号:US20240144912A1
公开(公告)日:2024-05-02
申请号:US18280159
申请日:2021-03-10
发明人: Junji WATANABE , Aiko MURATA
CPC分类号: G10L15/04 , G10L15/02 , G10L15/063
摘要: An estimation apparatus includes an estimation unit that estimates a future incident occurrence quantitative value in a region on the basis of at least two or more inputted psychological-state/sensibility expressing words emitted in a predetermined region and the input order of the two or more psychological-state/sensibility expressing words, using an estimation model for estimating an incident occurrence quantitative value that is a quantitative value of an occurrence of a predetermined event in the region after a certain time, with an input being at least a time series of two or more psychological-state/sensibility expressing words emitted in the predetermined region before the certain time.
-
公开(公告)号:US11967307B2
公开(公告)日:2024-04-23
申请号:US17174845
申请日:2021-02-12
发明人: Suraj Shinde
CPC分类号: G10L15/16 , G06N3/045 , G06N20/00 , G10L15/063 , G10L15/22 , G10L2015/223
摘要: Techniques are disclosed for applying a trained machine learning model to incoming voice communications to determine whether the voice communications are genuine or not genuine. The trained machine learning model may identify vocal attributes within the target call and use the identified attributes, and the training, determine whether the target call is genuine or not genuine. An applied trained machine learning model may include multiple different types of trained machine learning models, where each of different types of machine learning models is trained and/or configured for a different function within the analysis.
-
公开(公告)号:US20240127801A1
公开(公告)日:2024-04-18
申请号:US17965226
申请日:2022-10-13
发明人: Tohru Nagano , Gakuto Kurata
CPC分类号: G10L15/16 , G10L15/02 , G10L15/063 , G10L15/30 , G10L2015/022
摘要: Methods, systems, and computer program products for domain adaptive speech recognition using artificial intelligence are provided herein. A computer-implemented method includes generating a set of language data candidates, each language data candidate comprising one or more graphemes, by processing a sequence of phonemes related to input speech data using an artificial intelligence-based data conversion model; determining, for a target pair of phonemes and graphemes, a subset of graphemes from the set of language data candidates; generating a first speech recognition output by processing the subset of graphemes using at least one biasing language model and an artificial intelligence-based speech recognition model; generating a second speech recognition output by replacing at least a portion of the subset of graphemes in the first speech recognition output with at least one of the graphemes from the target pair; and performing automated actions based on the second speech recognition output.
-
公开(公告)号:US20240127796A1
公开(公告)日:2024-04-18
申请号:US18277552
申请日:2021-02-18
发明人: Hiroshi SATO , Takaaki FUKUTOMI , Yusuke SHINOHARA
CPC分类号: G10L15/063 , G10L15/16 , G10L2015/0635
摘要: The present invention estimates intention of an utterance more accurately than the related arts. A learning device learns an estimation model on the basis of learning data including an acoustic signal for learning and a label indicating whether or not the acoustic signal has been uttered to a predetermined target. The learning device includes: a feature synchronization unit configured to obtain a post-synchronization feature by synchronizing an acoustic feature obtained from the acoustic signal for learning with a text feature corresponding to the acoustic signal; an utterance intention estimation unit configured to estimate whether or not the acoustic signal has been uttered to the predetermined target by using the post-synchronization feature; and a parameter update unit configured to update a parameter of the estimation model on the basis of the label included in the learning data and an estimation result by the utterance intention estimation unit.
-
公开(公告)号:US11961509B2
公开(公告)日:2024-04-16
申请号:US16839308
申请日:2020-04-03
发明人: Swadheen Kumar Shukla , Lars Hasso Liden , Thomas Park , Matthew David Mazzola , Shahin Shayandeh , Jianfeng Gao , Eslam Kamal Abdelreheem
IPC分类号: G10L15/00 , G06N3/044 , G06N3/049 , G06N3/08 , G10L15/06 , G10L15/16 , G10L15/22 , G10L25/30
CPC分类号: G10L15/063 , G06N3/044 , G06N3/049 , G06N3/08 , G10L15/16 , G10L15/22 , G10L25/30 , G10L2015/0635 , G10L2015/225
摘要: Methods and systems are disclosed for improving dialog management for task-oriented dialog systems. The disclosed dialog builder leverages machine teaching processing to improve development of dialog managers. In this way, the dialog builder combines the strengths of both rule-based and machine-learned approaches to allow dialog authors to: (1) import a dialog graph developed using popular dialog composers, (2) convert the dialog graph to text-based training dialogs, (3) continuously improve the trained dialogs based on log dialogs, and (4) generate a corrected dialog for retraining the machine learning.
-
67.
公开(公告)号:US20240120108A1
公开(公告)日:2024-04-11
申请号:US18463685
申请日:2023-09-08
发明人: Michael Griffin , Hailey Kotvis , Josephine Miner , Porter Moody , Kayla Poulsen , Austin Malmin , Sarah Onstad-Hawes , Gloria Solovey , Austin Streitmatter
IPC分类号: G16H50/30 , G06T7/00 , G06V10/774 , G06V20/40 , G10L15/02 , G10L15/06 , G10L15/18 , G10L25/66
CPC分类号: G16H50/30 , G06T7/0012 , G06V10/774 , G06V20/41 , G06V20/46 , G10L15/02 , G10L15/063 , G10L15/1815 , G10L25/66 , G16H10/60
摘要: Apparatus and associated methods relate to enhancing care of a patient using video and audio analytics. Video data, audio data, and semantic text data are extracted from a video stream of the patient. The video data are analyzed to identify a first feature set. The audio data are analyzed to identify a second feature set. The semantic text data are analyzed to identify a third feature set. Using a computer-implemented machine-learning model, a health outcome of the patient is predicted based on the first, second, and/or third features sets. The health outcome that is predicted is compared with the set of health outcomes of the training patients classified with the patient classification of the patient. Differences are identified between the feature sets corresponding to the patient and feature sets of the training patients who have better health outcomes the patient's predicted health outcome. The differences identified are then reported.
-
公开(公告)号:US11955134B2
公开(公告)日:2024-04-09
申请号:US17643848
申请日:2021-12-13
申请人: Google LLC
发明人: Ehsan Amid , Om Thakkar , Rajiv Mathews , Francoise Beaufays
IPC分类号: G10L21/0332 , G10L15/06 , G10L15/08 , G10L21/10
CPC分类号: G10L21/0332 , G10L15/063 , G10L15/08 , G10L21/10
摘要: A method of phrase extraction for ASR models includes obtaining audio data characterizing an utterance and a corresponding ground-truth transcription of the utterance and modifying the audio data to obfuscate a particular phrase recited in the utterance. The method also includes processing, using a trained ASR model, the modified audio data to generate a predicted transcription of the utterance, and determining whether the predicted transcription includes the particular phrase by comparing the predicted transcription of the utterance to the ground-truth transcription of the utterance. When the predicted transcription includes the particular phrase, the method includes generating an output indicating that the trained ASR model leaked the particular phrase from a training data set used to train the ASR model.
-
69.
公开(公告)号:US20240112673A1
公开(公告)日:2024-04-04
申请号:US17958887
申请日:2022-10-03
申请人: GOOGLE LLC
发明人: Rajiv Mathews , Rohit Prabhavalkar , Giovanni Motta , Mingqing Chen , Lillian Zhou , Dhruv Guliani , Harry Zhang , Trevor Strohman , Françoise Beaufays
IPC分类号: G10L15/197 , G10L15/06 , G10L15/22 , G10L15/30
CPC分类号: G10L15/197 , G10L15/063 , G10L15/22 , G10L15/30 , G10L2015/0635
摘要: Implementations described herein identify and correct automatic speech recognition (ASR) misrecognitions. For example, on-device processor(s) of a client device may generate a predicted textual segment that is predicted to correspond to spoken utterance of a user of the client device, and may receive further input that modifies the predicted textual segment to an alternate textual segment. Further, the on-device processor(s) may store these textual segments in on-device storage as a candidate correction pair, and transmit the candidate correction pair to a remote system. Moreover, remote processor(s) of the remote system may determine that the candidate correction pair is an actual correction pair, and may cause client devices to generate updates for a global ASR model for the candidate correction pair. Additionally, the remote processor(s) may distribute the global ASR model to the client devices and/or additional client devices.
-
公开(公告)号:US11948567B2
公开(公告)日:2024-04-02
申请号:US17418314
申请日:2019-10-04
发明人: Jangho Jin , Jaehyun Bae
CPC分类号: G10L15/22 , G10L15/04 , G10L15/063 , G10L15/1822 , G10L2015/223 , G10L2015/227
摘要: The present disclosure provides an electronic device and a control method therefor. The electronic device of the present disclosure comprises: a voice reception unit; and a processor for, when a first user voice and a second user voice are received through the voice reception unit, determining whether the second user voice corresponds to a candidate of utterance subsequent to the first user voice on the basis of a result obtained by dividing a plurality of attributes of the second user voice according to a predefined attribute, and controlling the electronic device to perform an operation corresponding to the second user voice on the basis of the intent of the second user voice obtained through a result of the determination.
-
-
-
-
-
-
-
-
-