-
公开(公告)号:US12051513B2
公开(公告)日:2024-07-30
申请号:US18347382
申请日:2023-07-05
申请人: Canary Speech, LLC
IPC分类号: G16H80/00 , A61B5/00 , A61B5/11 , G06N3/08 , G06N7/01 , G06N20/10 , G10L25/66 , G16H10/20 , G16H40/67 , G16H50/20 , G16H50/50 , G06F111/10 , G10L15/02 , G10L15/06 , G10L15/22
CPC分类号: G16H80/00 , A61B5/1123 , A61B5/4088 , A61B5/4803 , A61B5/7267 , G06N3/08 , G06N7/01 , G06N20/10 , G10L25/66 , G16H10/20 , G16H40/67 , G16H50/20 , G16H50/50 , G06F2111/10 , G10L15/02 , G10L15/063 , G10L15/22
摘要: Apparatuses, systems, methods, and computer program products are disclosed for medical assessment based on voice. A query module is configured to audibly question a user from an electronic display screen and/or a speaker of a computing device with one or more open ended questions. A response module is configured to receive a conversational verbal response of a user from a microphone of a computing device in response to one or more open ended questions. A detection module is configured to provide a machine learning assessment for a user of a medical condition based on a machine learning analysis of a received conversational verbal response of the user.
-
公开(公告)号:US12051408B2
公开(公告)日:2024-07-30
申请号:US16838966
申请日:2020-04-02
申请人: Google LLC
发明人: Matthew Sharifi
IPC分类号: G10L15/22 , G06F3/16 , G10L15/02 , G10L15/06 , G10L15/08 , G10L15/26 , G10L15/28 , G10L17/22
CPC分类号: G10L15/22 , G06F3/167 , G10L15/02 , G10L15/063 , G10L15/08 , G10L15/26 , G10L15/285 , G10L17/22 , G10L2015/088 , G10L2015/223
摘要: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for designating certain voice commands as hotwords. The methods, systems, and apparatus include actions of receiving a hotword followed by a voice command. Additional actions include determining that the voice command satisfies one or more predetermined criteria associated with designating the voice command as a hotword, where a voice command that is designated as a hotword is treated as a voice input regardless of whether the voice command is preceded by another hotword. Further actions include, in response to determining that the voice command satisfies one or more predetermined criteria associated with designating the voice command as a hotword, designating the voice command as a hotword.
-
33.
公开(公告)号:US12039988B1
公开(公告)日:2024-07-16
申请号:US18424695
申请日:2024-01-26
申请人: Nantong University
发明人: Shibing Zhang , Jianrong Wu
CPC分类号: G10L21/02 , G10L15/063 , G10L15/20 , G10L25/51 , G10L2015/0631
摘要: The present application discloses a method and a system for saturation diving heliumspeech unscrambling based on multi-objective optimization. In a system including a diver and a filter at least, a working language phonetic symbol library and a common working word library for divers are constructed. The divers read them one by one, and a phonetic symbol standard speech library, a phonetic symbol heliumspeech library and a common working word speech library are generated. The filter uses the multi-objective optimization algorithm to design its impulse response coefficients, corrects and unscrambles the tagged and sampled heliumspeech signal word by word, and continuously updates the impulse response coefficients to complete the perfect heliumspeech unscrambling.
-
公开(公告)号:US12039982B2
公开(公告)日:2024-07-16
申请号:US17601662
申请日:2020-04-06
申请人: Google LLC
发明人: Laurent El Shafey , Hagen Soltau , Izhak Shafran
CPC分类号: G10L17/18 , G10L15/22 , G10L15/26 , G10L15/30 , G10L15/063
摘要: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing audio data using neural networks.
-
公开(公告)号:US20240233744A9
公开(公告)日:2024-07-11
申请号:US18275950
申请日:2021-02-08
发明人: Naoki MAKISHIMA , Ryo MASUMURA
IPC分类号: G10L21/028 , G10L15/06 , G10L15/25
CPC分类号: G10L21/028 , G10L15/063 , G10L15/25
摘要: A mixed acoustic signal including sound emitted from a plurality of sound sources and sound source video signals representing at least one video of the plurality of sound sources are received as inputs, and at least a separated signal including a signal representing a target sound emitted from one sound source represented by the video is acquired. However, at least the separated signal is acquired using properties of the sound source that affects sound emitted by the sound source acquired from the video and/or features of a structure used for the sound source to emit the sound.
-
公开(公告)号:US20240233715A1
公开(公告)日:2024-07-11
申请号:US18118282
申请日:2023-03-07
申请人: Drift.com, Inc.
CPC分类号: G10L15/1815 , G06F16/3344 , G06N20/00 , G10L15/063
摘要: A technique for semantic search and retrieval that is event-based, wherein is event is composed of a sequence of observations that are user speech or physical actions. Using a first set of conversations, a machine learning model is trained against groupings of utterances therein to generate a speech act classifier. Observation sequences therein are organized into groupings of events and configured for subsequent event recognition. A set of second (unannotated) conversations are then received. The set of second conversations is evaluated using the speech act classifier and information retrieved from the event recognition to generate event-level metadata that comprises, for each utterance or physical action within an event, one or more associated tags. In response to a query, a search is performed against the metadata. Because the metadata is derived from event recognition, the search is performed against events learned from the set of first conversations. One or more conversation fragments that, from an event-based perspective, are semantically-relevant to the query, are returned.
-
公开(公告)号:US12033621B2
公开(公告)日:2024-07-09
申请号:US17231945
申请日:2021-04-15
发明人: Dan Su , Tianxiao Fu , Min Luo , Qi Chen , Yulu Zhang , Lin Luo
IPC分类号: G10L15/187 , G10L15/00 , G10L15/02 , G10L15/06 , G10L15/22
CPC分类号: G10L15/187 , G10L15/005 , G10L15/02 , G10L15/063 , G10L15/22 , G10L2015/025
摘要: A method for speech recognition based on language adaptivity comprises obtaining voice data of a user. The method also comprises extracting, based on the obtained voice data, a phoneme feature representing pronunciation phoneme information. The phoneme feature is input to a pre-trained language discrimination model that is pre-trained based on a multilingual corpus. A language discrimination result corresponding to the phoneme feature and in accordance with the language discrimination model is obtained. The method also comprises obtaining a speech recognition result of the voice data based on a language acoustic model of a language corresponding to the language discrimination result. The method further comprises determining a speech recognition result of the voice data based on a language acoustic model of a language corresponding to the language discrimination result.
-
公开(公告)号:US20240221750A1
公开(公告)日:2024-07-04
申请号:US18610233
申请日:2024-03-19
申请人: Google LLC
发明人: Wei Li , Rohit Prakash Prabhavalkar , Kanury Kanishka Rao , Yanzhang He , Ian C. McGraw , Anton Bakhtin
CPC分类号: G10L15/22 , G10L15/02 , G10L15/063 , G10L15/18 , G10L19/00 , G10L2015/025 , G10L2015/088 , G10L15/142 , G10L2015/223
摘要: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for detecting utterances of a key phrase in an audio signal. One of the methods includes receiving, by a key phrase spotting system, an audio signal encoding one or more utterances; while continuing to receive the audio signal, generating, by the key phrase spotting system, an attention output using an attention mechanism that is configured to compute the attention output based on a series of encodings generated by an encoder comprising one or more neural network layers; generating, by the key phrase spotting system and using attention output, output that indicates whether the audio signal likely encodes the key phrase; and providing, by the key phrase spotting system, the output that indicates whether the audio signal likely encodes the key phrase.
-
39.
公开(公告)号:US20240221727A1
公开(公告)日:2024-07-04
申请号:US18266432
申请日:2022-09-01
发明人: Lanhua YOU , Lei JIA , Qi ZHANG , Zhengxiang JIANG
CPC分类号: G10L15/063 , G10L15/01 , G10L15/02 , G10L15/16
摘要: The present disclosure provides a voice recognition model training method and apparatus, an electronic device and a storage medium, relating to the field of artificial intelligence technology, and in particular to the fields such as deep learning and voice recognition. The specific implementation scheme includes constructing a negative sample according to a positive sample to obtain a target negative sample for constraining a voice decoding path; obtaining training data according to the positive sample and the target negative sample; and training a first voice recognition model according to the training data to obtain a second voice recognition model.
-
公开(公告)号:US12027156B2
公开(公告)日:2024-07-02
申请号:US17677921
申请日:2022-02-22
发明人: Aidan Smyth , Ashutosh Pandey , Avik Santra
CPC分类号: G10L15/10 , G10L15/04 , G10L15/063 , G10L15/22 , G10L25/18 , G10L2015/088 , G10L2015/223
摘要: Described are techniques for noise-robust and speaker-independent keyword spotting (KWS) in an input audio signal that contains keywords used to activate voice-based human-computer interactions. A KWS system may combine the latent representation generated by a denoising autoencoder (DAE) with audio features extracted from the audio signal using a machine learning approach. The DAE may be a discriminative DAE trained with a quadruplet loss metric learning approach to create a highly-separable latent representation of the audio signal in the audio input feature space. In one aspect, spectral characteristics of the audio signal such as Log-Mel features are combined with the latent representation generated by a quadruplet loss variational DAE (QVDQE) as input to a DNN KWS classifier. The KWS system improves keyword classification accuracy versus using extracted spectral features alone, non-discriminative DAE latent representations alone, or the extracted spectral features combined with the non-discriminative DAE latent representations in a KWS classifier.
-
-
-
-
-
-
-
-
-