Compressing audio waveforms using neural networks and vector quantizers

    公开(公告)号:US11600282B2

    公开(公告)日:2023-03-07

    申请号:US17856856

    申请日:2022-07-01

    Applicant: Google LLC

    Abstract: Methods, systems and apparatus, including computer programs encoded on computer storage media. One of the methods includes receiving an audio waveform that includes a respective audio sample for each of a plurality of time steps, processing the audio waveform using an encoder neural network to generate a plurality of feature vectors representing the audio waveform, generating a respective coded representation of each of the plurality of feature vectors using a plurality of vector quantizers that are each associated with a respective codebook of code vectors, wherein the respective coded representation of each feature vector identifies a plurality of code vectors, including a respective code vector from the codebook of each vector quantizer, that define a quantized representation of the feature vector, and generating a compressed representation of the audio waveform by compressing the respective coded representation of each of the plurality of feature vectors.

    MULTI-TASK ADAPTER NEURAL NETWORKS
    32.
    发明申请

    公开(公告)号:US20220383112A1

    公开(公告)日:2022-12-01

    申请号:US17764005

    申请日:2020-09-23

    Applicant: Google LLC

    Abstract: A system including a multi-task adapter neural network for performing multiple machine learning tasks is described. The adapter neural network is configured to receive a shared input for the machine learning tasks, and process the shared input to generate, for each of the machine learning tasks, a respective predicted output. The adapter neural network includes (i) a shared encoder configured to receive the shared input and to process the shared input to extract shared feature representations for the machine learning tasks, and (ii) multiple task-adapter encoders, each of the task-adapter encoders being associated with a respective machine learning task in the machine learning tasks and configured to: receive the shared input, receive the shared feature representations from the shared encoder, and process the shared input and the shared feature representations to generate the respective predicted output for the respective machine learning task.

    SEGMENT-BASED SPEAKER VERIFICATION USING DYNAMICALLY GENERATED PHRASES

    公开(公告)号:US20210295850A1

    公开(公告)日:2021-09-23

    申请号:US17303928

    申请日:2021-06-10

    Applicant: Google LLC

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for verifying an identity of a user. The methods, systems, and apparatus include actions of receiving a request for a verification phrase for verifying an identity of a user. Additional actions include, in response to receiving the request for the verification phrase for verifying the identity of the user, identifying subwords to be included in the verification phrase and in response to identifying the subwords to be included in the verification phrase, obtaining a candidate phrase that includes at least some of the identified subwords as the verification phrase. Further actions include providing the verification phrase as a response to the request for the verification phrase for verifying the identity of the user.

    Training Keyword Spotters
    35.
    发明申请

    公开(公告)号:US20210183367A1

    公开(公告)日:2021-06-17

    申请号:US16717518

    申请日:2019-12-17

    Applicant: Google LLC

    Abstract: A method of training a custom hotword model includes receiving a first set of training audio samples. The method also includes generating, using a speech embedding model configured to receive the first set of training audio samples as input, a corresponding hotword embedding representative of a custom hotword for each training audio sample of the first set of training audio samples. The speech embedding model is pre-trained on a different set of training audio samples with a greater number of training audio samples than the first set of training audio samples. The method further includes training the custom hotword model to detect a presence of the custom hotword in audio data. The custom hotword model is configured to receive, as input, each corresponding hotword embedding and to classify, as output, each corresponding hotword embedding as corresponding to the custom hotword.

    Self-Supervised Audio Representation Learning for Mobile Devices

    公开(公告)号:US20210056980A1

    公开(公告)日:2021-02-25

    申请号:US16548146

    申请日:2019-08-22

    Applicant: Google LLC

    Abstract: Systems and methods for training a machine-learned model are provided. A method can include can include obtaining an unlabeled audio signal, sampling the unlabeled audio signal to select one or more sampled slices, inputting the one or more sampled slices into a machine-learned model, receiving, as an output of the machine-learned model, one or more determined characteristics associated with the audio signal, determining a loss function for the machine-learned model based at least in part on a difference between the one or more determined characteristics and one or more corresponding ground truth characteristics of the audio signal, and training the machine-learned model from end to end based at least in part on the loss function. The one or more determined characteristics can include one or more reconstructed portions of the audio signal temporally adjacent to the one or more sampled slices or an estimated distance between two sampled slices.

    Object detection using neural network systems

    公开(公告)号:US10467493B2

    公开(公告)日:2019-11-05

    申请号:US15650790

    申请日:2017-07-14

    Applicant: Google LLC

    Abstract: Systems, methods, and apparatus, including computer programs encoded on a computer storage medium. In one aspect, a system includes initial neural network layers configured to: receive an input image, and process the input image to generate a plurality of first feature maps that characterize the input image; a location generating convolutional neural network layer configured to perform a convolution on the representation of the first plurality of feature maps to generate data defining a respective location of each of a predetermined number of bounding boxes in the input image, wherein each bounding box identifies a respective first region of the input image; and a confidence score generating convolutional neural network layer configured to perform a convolution on the representation of the first plurality of feature maps to generate a confidence score for each of the predetermined number of bounding boxes in the input image.

    Frequency based audio analysis using neural networks

    公开(公告)号:US10460747B2

    公开(公告)日:2019-10-29

    申请号:US15151362

    申请日:2016-05-10

    Applicant: Google LLC

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for frequency based audio analysis using neural networks. One of the methods includes training a neural network that includes a plurality of neural network layers on training data, wherein the neural network is configured to receive frequency domain features of an audio sample and to process the frequency domain features to generate a neural network output for the audio sample, wherein the neural network comprises (i) a convolutional layer that is configured to map frequency domain features to logarithmic scaled frequency domain features, wherein the convolutional layer comprises one or more convolutional layer filters, and (ii) one or more other neural network layers having respective layer parameters that are configured to process the logarithmic scaled frequency domain features to generate the neural network output.

    Audio data classification
    39.
    发明授权

    公开(公告)号:US10424321B1

    公开(公告)日:2019-09-24

    申请号:US13932158

    申请日:2013-07-01

    Applicant: Google LLC

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for analyzing an audio sample to determine whether the audio sample includes music audio data. One or more detectors, including a spectral fluctuation detector, a peak repetition detector, and a beat pitch detector, may analyze the audio sample and generate a score that represents whether the audio sample includes music audio data. One or more of the scores may be combined to determine whether the audio sample includes music audio data or non-music audio data.

Patent Agency Ranking