-
公开(公告)号:US12027154B2
公开(公告)日:2024-07-02
申请号:US18167050
申请日:2023-02-09
申请人: Google LLC
CPC分类号: G10L15/063 , G10L25/30 , G10L25/78
摘要: A method includes receiving a training example that includes audio data representing a spoken utterance and a ground truth transcription. For each word in the spoken utterance, the method also includes inserting a placeholder symbol before the respective word identifying a respective ground truth alignment for a beginning and an end of the respective word, determining a beginning word piece and an ending word piece, and generating a first constrained alignment for the beginning word piece and a second constrained alignment for the ending word piece. The first constrained alignment is aligned with the ground truth alignment for the beginning of the respective word and the second constrained alignment is aligned with the ground truth alignment for the ending of the respective word. The method also includes constraining an attention head of a second pass decoder by applying the first and second constrained alignments.
-
公开(公告)号:US12020689B2
公开(公告)日:2024-06-25
申请号:US17718357
申请日:2022-04-12
CPC分类号: G10L15/063 , G10L15/22 , G10L15/285 , G10L15/30 , G10L2015/0633
摘要: A computer-implemented method for virtual agent conversation training is disclosed. The computer-implemented method includes determining a current state of a first stage of a conversation between a pair of virtual agents. The computer-implemented method further includes determining a pivot distance between the current state of the first stage of the conversation and a subsequent, second stage of the conversation. The computer-implemented method further includes responsive to determining that the pivot distance between the current state of the first stage of the conversation and the subsequent, second stage of the conversation is below a predetermined threshold, determining an angle of dislocation with respect to the pivot distance. The computer-implemented method further includes terminating the conversation based, at least in part, on determining that the angle of dislocation is above a predetermined threshold.
-
公开(公告)号:US20240203436A1
公开(公告)日:2024-06-20
申请号:US18427869
申请日:2024-01-31
申请人: Nantong University
发明人: Shibing ZHANG , Jianrong WU , Lili GUO , Ming LI , Zhihua BAO
IPC分类号: G10L21/0208 , G10L15/06 , G10L15/16 , G10L25/51
CPC分类号: G10L21/0208 , G10L15/063 , G10L15/16 , G10L25/51 , G10L2015/0633
摘要: The present application relates to a lexicon learning-based heliumspeech unscrambling method in saturation diving. In a system including divers, a correction network, and an unscrambling network, a common working language lexicon for saturation diving operation is established and is read by the divers respectively in different environments, to generate supervision signals and vector signals of the correction network, and the correction network learns heliumspeeches of the different divers at different diving depths to obtain a correction network parameter, and corrects a heliumspeech of a diver to obtain a corrected speech; and the unscrambling network learns the corrected speech and completes unscrambling of the heliumspeech.
-
公开(公告)号:US20240203397A1
公开(公告)日:2024-06-20
申请号:US18066174
申请日:2022-12-14
发明人: Raphael TANG , Karun KUMAR , Kendra CHALKLEY , Liming ZHANG , Wenyan LI , Pamela SHAPIRO , Yajie MAO , Gefei YANG , Jun Ho SHIN , Geoffrey Craig MURRAY
IPC分类号: G10L15/01 , G06F40/169 , G10L15/06 , G10L15/197 , G10L15/22
CPC分类号: G10L15/01 , G06F40/169 , G10L15/063 , G10L15/197 , G10L15/22
摘要: Selection of training utterances may be carried out in a sample-efficient manner, and the selected training utterances may be annotated to provide improved training information to an ASR system. A computing device may receive, from an ASR system, one or more transcript-score pairs, wherein a transcript-score pair comprises a transcription associated with a voice query and at least one score associated with the transcription. The computing device may determine a likelihood of a word error associated with each transcription of the one or more transcript-score pairs. The computing device may determine, based on the likelihood of the word error, an effect on a word-error rate of the ASR system. The computing device may send at least one of the one or more transcript-score pairs with a threshold effect on the word-error rate of the ASR system to be annotated.
-
公开(公告)号:US12014725B2
公开(公告)日:2024-06-18
申请号:US17643861
申请日:2021-12-13
申请人: Google LLC
发明人: Ronny Huang , Tara N. Sainath
IPC分类号: G10L15/16 , G06N3/02 , G10L15/06 , G10L15/197 , G10L15/22
CPC分类号: G10L15/063 , G06N3/02 , G10L15/16 , G10L15/197 , G10L15/22
摘要: A method of training a language model for rare-word speech recognition includes obtaining a set of training text samples, and obtaining a set of training utterances used for training a speech recognition model. Each training utterance in the plurality of training utterances includes audio data corresponding to an utterance and a corresponding transcription of the utterance. The method also includes applying rare word filtering on the set of training text samples to identify a subset of rare-word training text samples that include words that do not appear in the transcriptions from the set of training utterances or appear in the transcriptions from the set of training utterances less than a threshold number of times. The method further includes training the external language model on the transcriptions from the set of training utterances and the identified subset of rare-word training text samples.
-
公开(公告)号:US12014722B2
公开(公告)日:2024-06-18
申请号:US17197587
申请日:2021-03-10
IPC分类号: G10L13/02 , G06F3/16 , G06N5/02 , G06N20/00 , G10K15/08 , G10L13/033 , G10L15/02 , G10L15/06 , G10L15/065 , G10L21/0224 , G10L25/03 , H04S7/00
CPC分类号: G10L13/02 , G06F3/165 , G06N5/02 , G06N20/00 , G10K15/08 , G10L13/033 , G10L15/02 , G10L15/063 , G10L15/065 , G10L21/0224 , G10L25/03 , H04S7/30 , H04S7/302 , H04S7/303
摘要: A method, computer program product, and computing system for receiving feature-based voice data associated with a first acoustic domain. One or more gain-based augmentations may be performed on at least a portion of the feature-based voice data, thus defining gain-augmented feature-based voice data.
-
公开(公告)号:US12008921B2
公开(公告)日:2024-06-11
申请号:US18152625
申请日:2023-01-10
申请人: 617 Education Inc.
发明人: Tom Dillon
CPC分类号: G09B7/04 , G06F3/167 , G09B19/04 , G10L15/02 , G10L15/063 , G10L15/22 , G10L25/18 , G10L25/30 , G10L2015/025 , G10L2015/225
摘要: Systems and methods are described for grapheme-phoneme correspondence learning. In an example, a display of a device is caused to output a grapheme graphical user interface (GUI) that includes a grapheme. Audio data representative of a sound made by the human user is received based on the grapheme shown on the display. A grapheme-phoneme model can determine whether the sound made by the human corresponds to a phoneme for the displayed grapheme based on the audio data. The grapheme-phoneme model is trained based on augmented spectrogram data. A speaker is caused to output a sound representative of the phoneme for the grapheme to provide the human with a correct pronunciation of the grapheme in response to the grapheme-phoneme model determining that the sound made by the human does not correspond to the phoneme for the grapheme.
-
公开(公告)号:US20240185850A1
公开(公告)日:2024-06-06
申请号:US18352601
申请日:2023-07-14
发明人: Rakshith Sharma Srinivasa , Yashas Malur Saidutta , Ching-Hua Lee , Chou-Chang Yang , Yilin Shen , Hongxia Jin
CPC分类号: G10L15/22 , G10L15/02 , G10L15/063 , G10L15/18 , G10L25/78 , G10L2015/088 , G10L2015/223
摘要: A method includes extracting, using a keyword detection model, audio features from audio data. The method also includes processing the audio features by a first layer of the keyword detection model configured to predict a first likelihood that the audio data includes speech. The method also includes processing the audio features by a second layer of the keyword detection model configured to predict a second likelihood that the audio data includes keyword-like speech. The method also includes processing the audio features by a third layer of the keyword detection model configured to predict a third likelihood, for each of a plurality of possible keywords, that the audio data includes the keyword. The method also includes identifying a keyword included in the audio data. The method also includes generating instructions to perform an action based at least in part on the identified keyword.
-
公开(公告)号:US20240185839A1
公开(公告)日:2024-06-06
申请号:US18526148
申请日:2023-12-01
申请人: Google LLC
IPC分类号: G10L15/06
CPC分类号: G10L15/063 , G10L2015/0635
摘要: A method for training a modular neural network model includes training only a backbone model to provide a first model configuration of the modular neural network model. The first model configuration includes only the trained backbone model. The method also includes adding an intrinsic sub-model to the trained backbone model. During a fine-tuning training stage, the method includes freezing parameters of the trained backbone model and fine-tuning parameters of the intrinsic sub-model added to the trained backbone model while the parameters of the trained backbone model are frozen to provide a second model configuration that includes the backbone model initially trained during the initial training stage and the intrinsic sub-model having the parameters fine-tuned during the fine-tuning stage.
-
公开(公告)号:US20240185838A1
公开(公告)日:2024-06-06
申请号:US18318225
申请日:2023-05-16
申请人: Openstream Inc.
CPC分类号: G10L15/063 , G10L15/1822 , G10L2015/0635
摘要: Described is a system and method for training a multilingual semantic parser. A method includes receiving, by a multilingual semantic parser, a multilingual training dataset, wherein the multilingual training dataset includes pairs of utterances and meaning representations from at least one high-resource language and at least one low-resource language and wherein the multilingual training dataset is initially a machine-translated dataset, training, the multilingual semantic parser, by translating the utterances in the multilingual training dataset to a target language; and iteratively performing selecting, by an acquisition functions estimator, a subset of the multilingual training dataset for human translation, updating the multilingual training dataset with the human-translated subset of the multilingual training dataset with, and retraining, the multilingual semantic parser, with the updated multilingual training dataset.
-
-
-
-
-
-
-
-
-