-
公开(公告)号:US12080313B2
公开(公告)日:2024-09-03
申请号:US17852765
申请日:2022-06-29
发明人: Jean-Marc Luneau , Stijn Robben
IPC分类号: H04R3/04 , G10L15/06 , G10L21/0208
CPC分类号: G10L21/0208 , G10L15/063 , H04R3/04
摘要: The present disclosure relates to an audio signal processing method implemented by an audio system which includes at least an internal sensor, wherein the audio signal processing method includes: measuring, by the internal sensor, a voice signal emitted by the user which propagates via bone-conduction to the internal sensor, thereby producing a bone-conducted audio signal; processing the bone-conducted audio signal by a machine learning model, wherein the machine learning model is previously trained to produce a predicted air-conducted audio signal which corresponds to a prediction of an audio signal that would be produced by measuring the same voice signal propagating via air-conduction to a microphone, by increasing a spectral bandwidth of the bone-conducted audio signal and/or by reshaping a spectrum of the bone-conducted audio signal; and producing an internal signal for the internal sensor based on the predicted air-conducted audio signal.
-
公开(公告)号:US12080277B1
公开(公告)日:2024-09-03
申请号:US18387692
申请日:2023-11-07
发明人: Yair Adato , Michael Feinstein , Nimrod Sarid , Ron Mokady , Eyal Gutflaish , Vered Horesh-Yaniv
CPC分类号: G10L15/063 , G10L15/18 , G10L2015/0631
摘要: Systems, methods and non-transitory computer readable media for attributing generated audio contents to training examples are provided. A first audio content generated using a generative model may be received. The generative model may be a result of training a machine learning model using training examples. Each training example may be associated with a respective audio content. Properties of the first audio content may be determined. For each training example of the training examples, the respective audio content may be analyzed to determine properties of the respective audio content. The properties of the first audio content and the properties of the audio contents associated with the training examples may be used to attribute the first audio content to a subgroup of the training examples. A respective data-record associated with a source associated with the training examples of the subgroup may be updated based on the attribution.
-
公开(公告)号:US20240290343A1
公开(公告)日:2024-08-29
申请号:US18176252
申请日:2023-02-28
申请人: Intel Corporation
发明人: Hector Alfonso Cordourier Maruri , Himanshu Bhalla , Georg Stemmer , Sinem Aslan , Julio Cesar Zamora , Jose Rodrigo Camacho Perez , Paulo Lopez Meyer , Alejandro Ibarra Von Borstel , Jose Israel Torres Ortega , Juan Antonio Del Hoyo Ontiveros
IPC分类号: G10L25/30 , G06N3/0499 , G10L15/06 , G10L25/12
CPC分类号: G10L25/30 , G06N3/0499 , G10L15/063 , G10L25/12 , G10L2015/0635
摘要: Methods, apparatus, systems, and articles of manufacture for real-time voice type detection in audio data are disclosed. An example non-transitory computer-readable medium disclosed herein includes instructions, which when executed, cause one or more processors to at least identify a first vocal effort of a first audio segment of first audio data and a second vocal effort of a second audio segment of the first audio data, train a neural network including training data, the training data including the first vocal effort, the first audio segment, the second audio segment, and the second vocal effort, and deploy the neural network, the neural network to distinguish between the first vocal effort and the second vocal effort.
-
公开(公告)号:US20240289563A1
公开(公告)日:2024-08-29
申请号:US18589358
申请日:2024-02-27
申请人: GOOGLE LLC
发明人: Michelle Tadmor Ramanovich , Eliya Nachmani , Alon Levkovitch , Byungha Chun , Yifan Ding , Nadav Bar , Chulayuth Asawaroengchai
CPC分类号: G06F40/58 , G10L15/005 , G10L15/063 , G10L25/18 , G10L2015/0635
摘要: Training and/or utilizing a Speech-To-Speech Translation (S2ST) system that can be used to generate, based on processing source audio data that captures a spoken utterance in a source language, target audio data that includes a synthetic spoken utterance that is spoken in a target language and that corresponds, both linguistically and para-linguistically, to the spoken utterance in the source language. Implementations that are directed to training the S2ST system utilize an unsupervised approach, with monolingual speech data, in training the S2ST system.
-
公开(公告)号:US12073824B2
公开(公告)日:2024-08-27
申请号:US17616135
申请日:2020-12-03
申请人: GOOGLE LLC
发明人: Tara N. Sainath , Yanzhang He , Bo Li , Arun Narayanan , Ruoming Pang , Antoine Jean Bruguier , Shuo-Yiin Chang , Wei Li
CPC分类号: G10L15/16 , G06N3/08 , G10L15/05 , G10L15/063 , G10L15/22 , G10L2015/0635
摘要: Two-pass automatic speech recognition (ASR) models can be used to perform streaming on-device ASR to generate a text representation of an utterance captured in audio data. Various implementations include a first-pass portion of the ASR model used to generate streaming candidate recognition(s) of an utterance captured in audio data. For example, the first-pass portion can include a recurrent neural network transformer (RNN-T) decoder. Various implementations include a second-pass portion of the ASR model used to revise the streaming candidate recognition(s) of the utterance and generate a text representation of the utterance. For example, the second-pass portion can include a listen attend spell (LAS) decoder. Various implementations include a shared encoder shared between the RNN-T decoder and the LAS decoder.
-
公开(公告)号:US12073823B2
公开(公告)日:2024-08-27
申请号:US18506540
申请日:2023-11-10
申请人: Google LLC
发明人: Georg Heigold , Erik Mcdermott , Vincent O. Vanhoucke , Andrew W. Senior , Michiel A. U. Bacchiani
IPC分类号: G10L15/06 , G06N3/045 , G10L15/16 , G10L15/183
CPC分类号: G10L15/063 , G06N3/045 , G10L15/16 , G10L15/183
摘要: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for obtaining, by a first sequence-training speech model, a first batch of training frames that represent speech features of first training utterances; obtaining, by the first sequence-training speech model, one or more first neural network parameters; determining, by the first sequence-training speech model, one or more optimized first neural network parameters based on (i) the first batch of training frames and (ii) the one or more first neural network parameters; obtaining, by a second sequence-training speech model, a second batch of training frames that represent speech features of second training utterances; obtaining one or more second neural network parameters; and determining, by the second sequence-training speech model, one or more optimized second neural network parameters based on (i) the second batch of training frames and (ii) the one or more second neural network parameters.
-
公开(公告)号:US12073818B2
公开(公告)日:2024-08-27
申请号:US17197740
申请日:2021-03-10
IPC分类号: G10L13/02 , G06F3/16 , G06N5/02 , G06N20/00 , G10K15/08 , G10L13/033 , G10L15/02 , G10L15/06 , G10L15/065 , G10L21/0224 , G10L25/03 , H04S7/00
CPC分类号: G10L13/02 , G06F3/165 , G06N5/02 , G06N20/00 , G10K15/08 , G10L13/033 , G10L15/02 , G10L15/063 , G10L15/065 , G10L21/0224 , G10L25/03 , H04S7/30 , H04S7/302 , H04S7/303
摘要: A method, computer program product, and computing system for receiving feature-based voice data. One or more data augmentation characteristics may be received. One or more augmentations of the feature-based voice data may be generated, via a machine learning model, based upon, at least in part, the feature-based voice data and the one or more data augmentation characteristics.
-
公开(公告)号:US20240282294A1
公开(公告)日:2024-08-22
申请号:US18651296
申请日:2024-04-30
申请人: Google LLC
发明人: Qingqing Huang , Daniel Sung-Joon Park , Aren Jansen , Timo Immanuel Denk , Yue Li , Ravi Ganti , Dan Ellis , Tao Wang , Wei Han , Joonseok Lee
CPC分类号: G10L15/063 , G10L15/16
摘要: A corpus of textual data is generated with a machine-learned text generation model. The corpus of textual data includes a plurality of sentences. Each sentence is descriptive of a type of audio. For each of a plurality of audio recordings, the audio recording is processed with a machine-learned audio classification model to obtain training data including the audio recording and one or more sentences of the plurality of sentences closest to the audio recording within a joint audio-text embedding space of the machine-learned audio classification model. The sentence(s) are processed with a machine-learned generation model to obtain an intermediate representation of the one or more sentences. The intermediate representation is processed with a machine-learned cascaded diffusion model to obtain audio data. The machine-learned cascaded diffusion model is trained based on a difference between the audio data and the audio recording.
-
29.
公开(公告)号:US12067987B2
公开(公告)日:2024-08-20
申请号:US18427538
申请日:2024-01-30
发明人: Linhao Dong , Zejun Ma
CPC分类号: G10L15/22 , G10L15/063
摘要: The present disclosure discloses a method and device of generating acoustic features, speech model training, and speech recognition. By acquiring the acoustic information vector of the current speech frame and the information weight of the current speech frame, and according to the accumulated information weight corresponding to the previous speech frame, the retention rate corresponding to the current speech frame, and the information weight of the current speech frame, the accumulated information weight corresponding to the current speech frame can be obtained. The retention rate is the difference between 1 and a leakage rate.
-
公开(公告)号:US20240257799A1
公开(公告)日:2024-08-01
申请号:US18161608
申请日:2023-01-30
申请人: Google LLC
发明人: Dragan Zivkovic , Agoston Weisz
CPC分类号: G10L15/063 , G10L15/08 , G10L15/22 , G10L2015/0636 , G10L2015/088 , G10L2015/223
摘要: A method includes receiving a biased transcription for a voice command spoken by a user and captured by a user device, the biased transcription biased to include a biasing phrase from a set of biasing phrases specific to the user. The method also includes instructing an application executing on the user device to perform an action specified by the biased transcription for the voice command, and receiving one or more user behavior signals responsive to the application performing the action specified by the biased transcription. The method further includes generating, as output from a confidence model, a confidence score of the biased transcription based on the one or more user behavior signals input to the confidence model and, based on the confidence score output from the confidence model, training a speech recognizer on the biased transcription.
-
-
-
-
-
-
-
-
-