专利检索 cpc:"G10L15/063" 第 3 页

21.

发明授权
Audio signal processing method and system for enhancing a bone-conducted audio signal using a machine learning model 有权

公开(公告)号：US12080313B2

公开(公告)日：2024-09-03

申请号：US17852765

申请日：2022-06-29

申请人： Analog Devices International Unlimited Company

发明人： Jean-Marc Luneau , Stijn Robben

IPC分类号： H04R3/04 , G10L15/06 , G10L21/0208

CPC分类号： G10L21/0208 , G10L15/063 , H04R3/04

摘要： The present disclosure relates to an audio signal processing method implemented by an audio system which includes at least an internal sensor, wherein the audio signal processing method includes: measuring, by the internal sensor, a voice signal emitted by the user which propagates via bone-conduction to the internal sensor, thereby producing a bone-conducted audio signal; processing the bone-conducted audio signal by a machine learning model, wherein the machine learning model is previously trained to produce a predicted air-conducted audio signal which corresponds to a prediction of an audio signal that would be produced by measuring the same voice signal propagating via air-conduction to a microphone, by increasing a spectral bandwidth of the bone-conducted audio signal and/or by reshaping a spectrum of the bone-conducted audio signal; and producing an internal signal for the internal sensor based on the predicted air-conducted audio signal.

22.

发明授权
Attributing generated audio contents to training examples 有权

公开(公告)号：US12080277B1

公开(公告)日：2024-09-03

申请号：US18387692

申请日：2023-11-07

申请人： BRIA ARTIFICIAL INTELLIGENCE LTD.

发明人： Yair Adato , Michael Feinstein , Nimrod Sarid , Ron Mokady , Eyal Gutflaish , Vered Horesh-Yaniv

IPC分类号： G10L15/06 , G10L15/18

CPC分类号： G10L15/063 , G10L15/18 , G10L2015/0631

摘要： Systems, methods and non-transitory computer readable media for attributing generated audio contents to training examples are provided. A first audio content generated using a generative model may be received. The generative model may be a result of training a machine learning model using training examples. Each training example may be associated with a respective audio content. Properties of the first audio content may be determined. For each training example of the training examples, the respective audio content may be analyzed to determine properties of the respective audio content. The properties of the first audio content and the properties of the audio contents associated with the training examples may be used to attribute the first audio content to a subgroup of the training examples. A respective data-record associated with a source associated with the training examples of the subgroup may be updated based on the attribution.

23.

发明公开
METHODS AND APPARATUS FOR REAL-TIME VOICE TYPE DETECTION IN AUDIO DATA 审中-公开

公开(公告)号：US20240290343A1

公开(公告)日：2024-08-29

申请号：US18176252

申请日：2023-02-28

申请人： Intel Corporation

发明人： Hector Alfonso Cordourier Maruri , Himanshu Bhalla , Georg Stemmer , Sinem Aslan , Julio Cesar Zamora , Jose Rodrigo Camacho Perez , Paulo Lopez Meyer , Alejandro Ibarra Von Borstel , Jose Israel Torres Ortega , Juan Antonio Del Hoyo Ontiveros

IPC分类号： G10L25/30 , G06N3/0499 , G10L15/06 , G10L25/12

CPC分类号： G10L25/30 , G06N3/0499 , G10L15/063 , G10L25/12 , G10L2015/0635

摘要： Methods, apparatus, systems, and articles of manufacture for real-time voice type detection in audio data are disclosed. An example non-transitory computer-readable medium disclosed herein includes instructions, which when executed, cause one or more processors to at least identify a first vocal effort of a first audio segment of first audio data and a second vocal effort of a second audio segment of the first audio data, train a neural network including training data, the training data including the first vocal effort, the first audio segment, the second audio segment, and the second vocal effort, and deploy the neural network, the neural network to distinguish between the first vocal effort and the second vocal effort.

24.

发明公开
SPEECH-TO-SPEECH TRANSLATION WITH MONOLINGUAL DATA 审中-公开

公开(公告)号：US20240289563A1

公开(公告)日：2024-08-29

申请号：US18589358

申请日：2024-02-27

申请人： GOOGLE LLC

发明人： Michelle Tadmor Ramanovich , Eliya Nachmani , Alon Levkovitch , Byungha Chun , Yifan Ding , Nadav Bar , Chulayuth Asawaroengchai

IPC分类号： G06F40/58 , G10L15/00 , G10L15/06 , G10L25/18

CPC分类号： G06F40/58 , G10L15/005 , G10L15/063 , G10L25/18 , G10L2015/0635

摘要： Training and/or utilizing a Speech-To-Speech Translation (S2ST) system that can be used to generate, based on processing source audio data that captures a spoken utterance in a source language, target audio data that includes a synthetic spoken utterance that is spoken in a target language and that corresponds, both linguistically and para-linguistically, to the spoken utterance in the source language. Implementations that are directed to training the S2ST system utilize an unsupervised approach, with monolingual speech data, in training the S2ST system.

25.

发明授权
Two-pass end to end speech recognition 有权

公开(公告)号：US12073824B2

公开(公告)日：2024-08-27

申请号：US17616135

申请日：2020-12-03

申请人： GOOGLE LLC

发明人： Tara N. Sainath , Yanzhang He , Bo Li , Arun Narayanan , Ruoming Pang , Antoine Jean Bruguier , Shuo-Yiin Chang , Wei Li

IPC分类号： G10L15/00 , G06N3/08 , G10L15/05 , G10L15/06 , G10L15/16 , G10L15/22

CPC分类号： G10L15/16 , G06N3/08 , G10L15/05 , G10L15/063 , G10L15/22 , G10L2015/0635

摘要： Two-pass automatic speech recognition (ASR) models can be used to perform streaming on-device ASR to generate a text representation of an utterance captured in audio data. Various implementations include a first-pass portion of the ASR model used to generate streaming candidate recognition(s) of an utterance captured in audio data. For example, the first-pass portion can include a recurrent neural network transformer (RNN-T) decoder. Various implementations include a second-pass portion of the ASR model used to revise the streaming candidate recognition(s) of the utterance and generate a text representation of the utterance. For example, the second-pass portion can include a listen attend spell (LAS) decoder. Various implementations include a shared encoder shared between the RNN-T decoder and the LAS decoder.

26.

发明授权
Asynchronous optimization for sequence training of neural networks 有权

公开(公告)号：US12073823B2

公开(公告)日：2024-08-27

申请号：US18506540

申请日：2023-11-10

申请人： Google LLC

发明人： Georg Heigold , Erik Mcdermott , Vincent O. Vanhoucke , Andrew W. Senior , Michiel A. U. Bacchiani

IPC分类号： G10L15/06 , G06N3/045 , G10L15/16 , G10L15/183

CPC分类号： G10L15/063 , G06N3/045 , G10L15/16 , G10L15/183

摘要： Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for obtaining, by a first sequence-training speech model, a first batch of training frames that represent speech features of first training utterances; obtaining, by the first sequence-training speech model, one or more first neural network parameters; determining, by the first sequence-training speech model, one or more optimized first neural network parameters based on (i) the first batch of training frames and (ii) the one or more first neural network parameters; obtaining, by a second sequence-training speech model, a second batch of training frames that represent speech features of second training utterances; obtaining one or more second neural network parameters; and determining, by the second sequence-training speech model, one or more optimized second neural network parameters based on (i) the second batch of training frames and (ii) the one or more second neural network parameters.

27.

发明授权
System and method for data augmentation of feature-based voice data 有权

公开(公告)号：US12073818B2

公开(公告)日：2024-08-27

申请号：US17197740

申请日：2021-03-10

申请人： Microsoft Technology Licensing, LLC

发明人： Dushyant Sharma , Patrick A. Naylor , James W. Fosburgh , Do Yeong Kim

IPC分类号： G10L13/02 , G06F3/16 , G06N5/02 , G06N20/00 , G10K15/08 , G10L13/033 , G10L15/02 , G10L15/06 , G10L15/065 , G10L21/0224 , G10L25/03 , H04S7/00

CPC分类号： G10L13/02 , G06F3/165 , G06N5/02 , G06N20/00 , G10K15/08 , G10L13/033 , G10L15/02 , G10L15/063 , G10L15/065 , G10L21/0224 , G10L25/03 , H04S7/30 , H04S7/302 , H04S7/303

摘要： A method, computer program product, and computing system for receiving feature-based voice data. One or more data augmentation characteristics may be received. One or more augmentations of the feature-based voice data may be generated, via a machine learning model, based upon, at least in part, the feature-based voice data and the one or more data augmentation characteristics.

28.

发明公开
Diffusion Models for Generation of Audio Data Based on Descriptive Textual Prompts 审中-公开

公开(公告)号：US20240282294A1

公开(公告)日：2024-08-22

申请号：US18651296

申请日：2024-04-30

申请人： Google LLC

发明人： Qingqing Huang , Daniel Sung-Joon Park , Aren Jansen , Timo Immanuel Denk , Yue Li , Ravi Ganti , Dan Ellis , Tao Wang , Wei Han , Joonseok Lee

IPC分类号： G10L15/06 , G10L15/16

CPC分类号： G10L15/063 , G10L15/16

摘要： A corpus of textual data is generated with a machine-learned text generation model. The corpus of textual data includes a plurality of sentences. Each sentence is descriptive of a type of audio. For each of a plurality of audio recordings, the audio recording is processed with a machine-learned audio classification model to obtain training data including the audio recording and one or more sentences of the plurality of sentences closest to the audio recording within a joint audio-text embedding space of the machine-learned audio classification model. The sentence(s) are processed with a machine-learned generation model to obtain an intermediate representation of the one or more sentences. The intermediate representation is processed with a machine-learned cascaded diffusion model to obtain audio data. The machine-learned cascaded diffusion model is trained based on a difference between the audio data and the audio recording.

29.

发明授权
Method and device of generating acoustic features, speech model training, and speech recognition 有权

公开(公告)号：US12067987B2

公开(公告)日：2024-08-20

申请号：US18427538

申请日：2024-01-30

申请人： Beijing Youzhuju Network Technology Co., Ltd.

发明人： Linhao Dong , Zejun Ma

IPC分类号： G10L15/02 , G10L15/06 , G10L15/22

CPC分类号： G10L15/22 , G10L15/063

摘要： The present disclosure discloses a method and device of generating acoustic features, speech model training, and speech recognition. By acquiring the acoustic information vector of the current speech frame and the information weight of the current speech frame, and according to the accumulated information weight corresponding to the previous speech frame, the retention rate corresponding to the current speech frame, and the information weight of the current speech frame, the accumulated information weight corresponding to the current speech frame can be obtained. The retention rate is the difference between 1 and a leakage rate.

30.

发明公开
Training Speech Recognizers Based On Biased Transcriptions 审中-公开

公开(公告)号：US20240257799A1

公开(公告)日：2024-08-01

申请号：US18161608

申请日：2023-01-30

申请人： Google LLC

发明人： Dragan Zivkovic , Agoston Weisz

IPC分类号： G10L15/06 , G10L15/08 , G10L15/22

CPC分类号： G10L15/063 , G10L15/08 , G10L15/22 , G10L2015/0636 , G10L2015/088 , G10L2015/223

摘要： A method includes receiving a biased transcription for a voice command spoken by a user and captured by a user device, the biased transcription biased to include a biasing phrase from a set of biasing phrases specific to the user. The method also includes instructing an application executing on the user device to perform an action specified by the biased transcription for the voice command, and receiving one or more user behavior signals responsive to the application performing the action specified by the biased transcription. The method further includes generating, as output from a confidence model, a confidence score of the biased transcription based on the one or more user behavior signals input to the confidence model and, based on the confidence score output from the confidence model, training a speech recognizer on the biased transcription.

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类