Robustness Aware Norm Decay for Quantization Aware Training and Generalization

    公开(公告)号:US20240347043A1

    公开(公告)日:2024-10-17

    申请号:US18632237

    申请日:2024-04-10

    Applicant: Google LLC

    CPC classification number: G10L15/063

    Abstract: A method includes obtaining a plurality of training samples, determining a minimum integer fixed-bit width representing a maximum quantization of an automatic speech recognition (ASR) model, and training the ASR model on the plurality of training samples using a quantity of random noise. The ASR model includes a plurality of weights that each include a respective float value. The quantity of random noise is based on the minimum integer fixed-bit value. After training the ASR model, the method also includes selecting a target integer fixed-bit width greater than or equal to the minimum integer fixed-bit width, and for each respective weight of the plurality of weights, quantizing the respective weight from the respective float value to a respective integer associated with a value of the selected target integer fixed-bit width. The operations also include providing the quantized trained ASR model to a user device.

    Optimizing Personal VAD for On-Device Speech Recognition

    公开(公告)号:US20230298591A1

    公开(公告)日:2023-09-21

    申请号:US18123060

    申请日:2023-03-17

    Applicant: Google LLC

    CPC classification number: G10L17/06 G10L17/22

    Abstract: A computer-implemented method includes receiving a sequence of acoustic frames corresponding to an utterance and generating a reference speaker embedding for the utterance. The method also includes receiving a target speaker embedding for a target speaker and generating feature-wise linear modulation (FiLM) parameters including a scaling vector and a shifting vector based on the target speaker embedding. The method also includes generating an affine transformation output that scales and shifts the reference speaker embedding based on the FiLM parameters. The method also includes generating a classification output indicating whether the utterance was spoken by the target speaker based on the affine transformation output.

    4-bit Conformer with Accurate Quantization Training for Speech Recognition

    公开(公告)号:US20230298569A1

    公开(公告)日:2023-09-21

    申请号:US18186774

    申请日:2023-03-20

    Applicant: Google LLC

    CPC classification number: G10L15/063 G10L15/16

    Abstract: A method for training a model includes obtaining a plurality of training samples. Each respective training sample of the plurality of training samples includes a respective speech utterance and a respective textual utterance representing a transcription of the respective speech utterance. The method includes training, using quantization aware training with native integer operations, an automatic speech recognition (ASR) model on the plurality of training samples. The method also includes quantizing the trained ASR model to an integer target fixed-bit width. The quantized trained ASR model includes a plurality of weights. Each weight of the plurality of weights includes an integer with the target fixed-bit width. The method includes providing the quantized trained ASR model to a user device.

Patent Agency Ranking