Audio signal processing method and system for enhancing a bone-conducted audio signal using a machine learning model

    公开(公告)号:US12080313B2

    公开(公告)日:2024-09-03

    申请号:US17852765

    申请日:2022-06-29

    摘要: The present disclosure relates to an audio signal processing method implemented by an audio system which includes at least an internal sensor, wherein the audio signal processing method includes: measuring, by the internal sensor, a voice signal emitted by the user which propagates via bone-conduction to the internal sensor, thereby producing a bone-conducted audio signal; processing the bone-conducted audio signal by a machine learning model, wherein the machine learning model is previously trained to produce a predicted air-conducted audio signal which corresponds to a prediction of an audio signal that would be produced by measuring the same voice signal propagating via air-conduction to a microphone, by increasing a spectral bandwidth of the bone-conducted audio signal and/or by reshaping a spectrum of the bone-conducted audio signal; and producing an internal signal for the internal sensor based on the predicted air-conducted audio signal.

    Attributing generated audio contents to training examples

    公开(公告)号:US12080277B1

    公开(公告)日:2024-09-03

    申请号:US18387692

    申请日:2023-11-07

    IPC分类号: G10L15/06 G10L15/18

    摘要: Systems, methods and non-transitory computer readable media for attributing generated audio contents to training examples are provided. A first audio content generated using a generative model may be received. The generative model may be a result of training a machine learning model using training examples. Each training example may be associated with a respective audio content. Properties of the first audio content may be determined. For each training example of the training examples, the respective audio content may be analyzed to determine properties of the respective audio content. The properties of the first audio content and the properties of the audio contents associated with the training examples may be used to attribute the first audio content to a subgroup of the training examples. A respective data-record associated with a source associated with the training examples of the subgroup may be updated based on the attribution.

    Asynchronous optimization for sequence training of neural networks

    公开(公告)号:US12073823B2

    公开(公告)日:2024-08-27

    申请号:US18506540

    申请日:2023-11-10

    申请人: Google LLC

    摘要: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for obtaining, by a first sequence-training speech model, a first batch of training frames that represent speech features of first training utterances; obtaining, by the first sequence-training speech model, one or more first neural network parameters; determining, by the first sequence-training speech model, one or more optimized first neural network parameters based on (i) the first batch of training frames and (ii) the one or more first neural network parameters; obtaining, by a second sequence-training speech model, a second batch of training frames that represent speech features of second training utterances; obtaining one or more second neural network parameters; and determining, by the second sequence-training speech model, one or more optimized second neural network parameters based on (i) the second batch of training frames and (ii) the one or more second neural network parameters.

    Diffusion Models for Generation of Audio Data Based on Descriptive Textual Prompts

    公开(公告)号:US20240282294A1

    公开(公告)日:2024-08-22

    申请号:US18651296

    申请日:2024-04-30

    申请人: Google LLC

    IPC分类号: G10L15/06 G10L15/16

    CPC分类号: G10L15/063 G10L15/16

    摘要: A corpus of textual data is generated with a machine-learned text generation model. The corpus of textual data includes a plurality of sentences. Each sentence is descriptive of a type of audio. For each of a plurality of audio recordings, the audio recording is processed with a machine-learned audio classification model to obtain training data including the audio recording and one or more sentences of the plurality of sentences closest to the audio recording within a joint audio-text embedding space of the machine-learned audio classification model. The sentence(s) are processed with a machine-learned generation model to obtain an intermediate representation of the one or more sentences. The intermediate representation is processed with a machine-learned cascaded diffusion model to obtain audio data. The machine-learned cascaded diffusion model is trained based on a difference between the audio data and the audio recording.

    Training Speech Recognizers Based On Biased Transcriptions

    公开(公告)号:US20240257799A1

    公开(公告)日:2024-08-01

    申请号:US18161608

    申请日:2023-01-30

    申请人: Google LLC

    IPC分类号: G10L15/06 G10L15/08 G10L15/22

    摘要: A method includes receiving a biased transcription for a voice command spoken by a user and captured by a user device, the biased transcription biased to include a biasing phrase from a set of biasing phrases specific to the user. The method also includes instructing an application executing on the user device to perform an action specified by the biased transcription for the voice command, and receiving one or more user behavior signals responsive to the application performing the action specified by the biased transcription. The method further includes generating, as output from a confidence model, a confidence score of the biased transcription based on the one or more user behavior signals input to the confidence model and, based on the confidence score output from the confidence model, training a speech recognizer on the biased transcription.