-
公开(公告)号:US20240078999A1
公开(公告)日:2024-03-07
申请号:US18272246
申请日:2021-01-15
IPC分类号: G10L15/06
CPC分类号: G10L15/063
摘要: A learning method includes the following processes. A shuffling process acquires learning data arranged in a time series and rearranges the learning data in an order different from the order of the time series. A learning process trains an acoustic model using the learning data rearranged through the shuffling process.
-
公开(公告)号:US11924624B2
公开(公告)日:2024-03-05
申请号:US17669549
申请日:2022-02-11
IPC分类号: H04S7/00 , G06T7/70 , G10L15/06 , G10L15/22 , G10L19/00 , G10L19/008 , G10L19/16 , G10L21/0208 , G10L21/0216 , H04R1/10 , H04R1/40 , H04R3/00 , H04R5/027 , H04S3/00
CPC分类号: H04S7/30 , G06T7/70 , G10L15/063 , G10L15/22 , G10L19/008 , G10L19/167 , G10L21/0208 , H04R1/406 , H04R3/005 , H04R5/027 , H04S3/008 , G10L2019/0001 , G10L2019/0002 , G10L2021/02166 , H04R2201/401 , H04S2400/01 , H04S2400/15
摘要: A method, computer program product, and computing system for selecting a reference audio acquisition device from a plurality of audio acquisition devices of an audio recording system. Audio encounter information of the reference microphone may be encoded, thus defining encoded reference audio encounter information. A plurality of acoustic relative transfer functions between the reference microphone and the plurality of audio acquisition devices of the audio recording system may be generated. The encoded reference audio encounter information and a representation of the plurality of acoustic relative transfer functions may be transmitted.
-
公开(公告)号:US11922932B2
公开(公告)日:2024-03-05
申请号:US18194586
申请日:2023-03-31
申请人: Google LLC
发明人: Rohit Prakash Prabhavalkar , Tara N. Sainath , Yonghui Wu , Patrick An Phu Nguyen , Zhifeng Chen , Chung-Cheng Chiu , Anjuli Patricia Kannan
IPC分类号: G10L15/197 , G10L15/02 , G10L15/06 , G10L15/16 , G10L15/22
CPC分类号: G10L15/197 , G10L15/02 , G10L15/063 , G10L15/16 , G10L15/22 , G10L2015/025
摘要: Methods, systems, and apparatus, including computer programs encoded on computer-readable storage media, for speech recognition using attention-based sequence-to-sequence models. In some implementations, audio data indicating acoustic characteristics of an utterance is received. A sequence of feature vectors indicative of the acoustic characteristics of the utterance is generated. The sequence of feature vectors is processed using a speech recognition model that has been trained using a loss function that uses a set of speech recognition hypothesis samples, the speech recognition model including an encoder, an attention module, and a decoder. The encoder and decoder each include one or more recurrent neural network layers. A sequence of output vectors representing distributions over a predetermined set of linguistic units is obtained. A transcription for the utterance is obtained based on the sequence of output vectors. Data indicating the transcription of the utterance is provided.
-
公开(公告)号:US11922928B2
公开(公告)日:2024-03-05
申请号:US17539248
申请日:2021-12-01
发明人: Ramakrishna R. Yannam , Isaac Persing , Emad Noorizadeh , Sushil Golani , Hari Gopalkrishnan , Dana Patrice Morrow Branch
CPC分类号: G10L15/16 , G10L15/063 , G10L15/22 , G10L15/30 , G10L2015/0638
摘要: Apparatus and methods for leveraging machine learning and artificial intelligence to assess a sentiment of an utterance expressed by a user during an interaction between an interactive response system and the user is provided. The methods may include a natural language processor processing the utterance to output an utterance intent. The methods may also include a signal extractor processing the utterance, the utterance intent and previous utterance data to output utterance signals. The methods may additionally include an utterance sentiment classifier using a hierarchy of rules to extract, from a database, a label, the extracting being based on the utterance signals. The methods may further include a sequential neural network classifier using a trained algorithm to process the label and a sequence of historical labels to output a sentiment score.
-
公开(公告)号:US20240071372A1
公开(公告)日:2024-02-29
申请号:US18387288
申请日:2023-11-06
申请人: NeoSensory, Inc.
发明人: Oleksii Abramenko , Kaan Donbekci , Michael V. Perrotta , Scott Novich , Kathleen W. McMahon , David M. Eagleman
IPC分类号: G10L15/16 , G10L15/06 , G10L15/187
CPC分类号: G10L15/16 , G10L15/063 , G10L15/187
摘要: A system for providing information to a user includes and/or interfaces with a set of models and/or algorithms. Additionally or alternatively, the system can include and/or interface with any or all of: a processing subsystem; a sensory output device; a user device; an audio input device; and/or any other components. A method for providing information to a user includes and/or interfaces with: receiving a set of inputs; processing the set of inputs to determine a set of sensory outputs; and providing the set of sensory outputs.
-
公开(公告)号:US11915685B2
公开(公告)日:2024-02-27
申请号:US18189145
申请日:2023-03-23
发明人: Zhenhao Ge , Lakshmish Kaushik , Saket Kumar , Masanori Omote
CPC分类号: G10L15/063 , G10L13/02
摘要: Techniques are described for training neural networks on variable length datasets. The numeric representation of the length of each training sample is randomly perturbed to yield a pseudo-length, and the samples sorted by pseudo-length to achieve lower zero padding rate (ZPR) than completely randomized batching (thus saving computation time) yet higher randomness than strictly sorted batching (thus achieving better model performance than strictly sorted batching).
-
公开(公告)号:US20240062775A1
公开(公告)日:2024-02-22
申请号:US18452351
申请日:2023-08-18
申请人: Vocollect, Inc.
发明人: David D. HARDEK
CPC分类号: G10L25/84 , G10L25/51 , G10L15/07 , G10L15/063 , G10L15/16 , G10L2025/783
摘要: A device, system, and method whereby a speech-driven system can distinguish speech obtained from users of the system from other speech spoken by background persons, as well as from background speech from public address systems. In one aspect, the present system and method prepares, in advance of field-use, a voice-data file which is created in a training environment. The training environment exhibits both desired user speech and unwanted background speech, including unwanted speech from persons other than a user and also speech from a PA system. The speech recognition system is trained or otherwise programmed to identify wanted user speech which may be spoken concurrently with the background sounds. In an embodiment, during the pre-field-use phase the training or programming may be accomplished by having persons who are training listeners audit the pre-recorded sounds to identify the desired user speech. A processor-based learning system is trained to duplicate the assessments made by the human listeners.
-
88.
公开(公告)号:US20240062744A1
公开(公告)日:2024-02-22
申请号:US18384009
申请日:2023-10-26
发明人: Jingjing LIU , Bihong Zhang
CPC分类号: G10L15/063 , G10L15/02 , G10L15/04 , G10L19/16
摘要: A real-time voice recognition method and a real-time voice recognition model training method are provided. The model training method includes: obtaining an audio feature sequence of sample voice data, the audio feature sequence comprising audio features of a plurality of audio frames of the sample voice data; inputting the audio feature sequence to an encoder of the real-time voice recognition model; chunking the audio feature sequence into a plurality of chunks by the encoder according to a mask matrix; encoding each of the chunks to obtain a hidden layer feature sequence of the sample voice data; decoding the hidden layer feature sequence by a decoder of the real-time voice recognition model to obtain a predicted recognition result for the sample voice data; and training the real-time voice recognition model based on the predicted recognition result and a real recognition result of the sample voice data.
-
公开(公告)号:US11908454B2
公开(公告)日:2024-02-20
申请号:US17539752
申请日:2021-12-01
CPC分类号: G10L15/063 , G06N3/08 , G10L21/10
摘要: A processor-implemented method trains an automatic speech recognition system using speech data and text data. A computer device receives speech data, and generates a spectrogram based on the speech data. The computing device receives text data associated with an entire corpus of text data, and generates a textogram based upon the text data. The computing device trains an automatic speech recognition system using the spectrogram and the textogram.
-
公开(公告)号:US11902222B2
公开(公告)日:2024-02-13
申请号:US17170300
申请日:2021-02-08
申请人: GOOGLE LLC
发明人: Asaf Aharoni , Eyal Segalis , Ofer Ron , Sasha Goldshtein , Tomer Amiaz , Razvan Mathias , Yaniv Leviathan
CPC分类号: H04L51/02 , G06N20/00 , G10L15/063 , G10L15/10 , G10L15/22
摘要: Implementations are directed to updating a trained voice bot that is deployed for conducting conversations on behalf of a third-party. A third-party developer can interact with a voice bot development system that enables the third-party developer to train, update, validate, and monitor performance of the trained voice bot. In various implementations, the trained voice bot can be updated by updating a corpus of training instances that was initially utilized to train the voice bot, and updating the trained voice bot based on the updated corpus. In some implementations, the corpus of training instances may be updated in response to identifying occurrence(s) of behavioral error(s) of the trained voice bot while the conversations are being conducted on behalf of the third-party. In additional or alternative implementations, the corpus of training instances may be updated in response to determining the trained voice bot does not include a desired behavior.
-
-
-
-
-
-
-
-
-