MIXTURE-OF-EXPERT CONFORMER FOR STREAMING MULTILINGUAL ASR

    公开(公告)号:US20240304185A1

    公开(公告)日:2024-09-12

    申请号:US18598885

    申请日:2024-03-07

    Applicant: Google LLC

    CPC classification number: G10L15/197 G10L15/02 G10L15/063

    Abstract: A method of a multilingual ASR model includes receiving a sequence of acoustic frames characterizing an utterance of speech. At a plurality of output steps, the method further includes generating a first higher order feature representation for an acoustic frame by a first encoder that includes a first plurality of multi-head attention layers; generating a second higher order feature representation for a corresponding first higher order feature representation by a second encoder that includes a second plurality of multi-head attention layers; and generating, by a first decoder, a first probability distribution over possible speech recognition hypotheses based on the second higher order feature representation and a sequence of N previous non-blank symbols. A gating layer of each respective MoE layer configured to dynamically route an output from a previous multi-head attention layer at each of the plurality of output steps to a respective pair of feed-forward expert networks.

    Streaming End-to-end Multilingual Speech Recognition with Joint Language Identification

    公开(公告)号:US20230306958A1

    公开(公告)日:2023-09-28

    申请号:US18188632

    申请日:2023-03-23

    Applicant: Google LLC

    CPC classification number: G10L15/005 G10L15/16 G10L15/063

    Abstract: A method includes receiving a sequence of acoustic frames as input to an automatic speech recognition (ASR) model. The method also includes generating, by a first encoder, a first higher order feature representation for a corresponding acoustic frame. The method also includes generating, by a second encoder, a second higher order feature representation for a corresponding first higher order feature representation. The method also includes generating, by a language identification (ID) predictor, a language prediction representation based on a concatenation of the first higher order feature representation and the second higher order feature representation. The method also includes generating, by a first decoder, a first probability distribution over possible speech recognition hypotheses based on a concatenation of the second higher order feature representation and the language prediction representation.

    Systems and Methods for Training Dual-Mode Machine-Learned Speech Recognition Models

    公开(公告)号:US20230237993A1

    公开(公告)日:2023-07-27

    申请号:US18011571

    申请日:2021-10-01

    Applicant: Google LLC

    CPC classification number: G10L15/16 G10L15/32 G10L15/22

    Abstract: Systems and methods of the present disclosure are directed to a computing system, including one or more processors and a machine-learned multi-mode speech recognition model configured to operate in a streaming recognition mode or a contextual recognition mode. The computing system can perform operations including obtaining speech data and a ground truth label and processing the speech data using the contextual recognition mode to obtain contextual prediction data. The operations can include evaluating a difference between the contextual prediction data and the ground truth label and processing the speech data using the streaming recognition mode to obtain streaming prediction data. The operations can include evaluating a difference between the streaming prediction data and the ground truth label and the contextual and streaming prediction data. The operations can include adjusting parameters of the speech recognition model.

    EFFICIENT IMAGE DATA DELIVERY FOR AN ARRAY OF PIXEL MEMORY CELLS

    公开(公告)号:US20230147106A1

    公开(公告)日:2023-05-11

    申请号:US18150724

    申请日:2023-01-05

    Applicant: GOOGLE LLC

    CPC classification number: G09G3/3688 G09G2360/12

    Abstract: A backplane design for delivering image data in an efficient manner to a memory cell forming a part of a pixel driver comprises a word line design and a column data register release signal delivery design that are speed matched and a complementary bit line delivery design that is speed matched to a row decoder signal circuit operative to pull a word line driver to a state to enable the memory circuits of that row to receive data from the column drivers for each column. The speed matching is effective over a range of operating temperatures because the circuit designs are substantially identical.

    Unified End-To-End Speech Recognition And Endpointing Using A Switch Connection

    公开(公告)号:US20240029719A1

    公开(公告)日:2024-01-25

    申请号:US18340093

    申请日:2023-06-23

    Applicant: Google LLC

    CPC classification number: G10L15/16 G10L15/063 G10L25/93

    Abstract: A single E2E multitask model includes a speech recognition model and an endpointer model. The speech recognition model includes an audio encoder configured to encode a sequence of audio frames into corresponding higher-order feature representations, and a decoder configured to generate probability distributions over possible speech recognition hypotheses for the sequence of audio frames based on the higher-order feature representations. The endpointer model is configured to operate between a VAD mode and an EOQ detection mode. During the VAD mode, the endpointer model receives input audio frames, and determines, for each input audio frame, whether the input audio frame includes speech. During the EOQ detection mode, the endpointer model receives latent representations for the sequence of audio frames output from the audio encoder, and determines, for each of the latent representation, whether the latent representation includes final silence.

Patent Agency Ranking