Joint Speech and Text Streaming Model for ASR

    公开(公告)号:US20240028829A1

    公开(公告)日:2024-01-25

    申请号:US18346232

    申请日:2023-07-01

    申请人: Google LLC

    IPC分类号: G06F40/284 G06F40/40

    CPC分类号: G06F40/284 G06F40/40

    摘要: A method includes receiving training data that includes a set of unspoken textual utterances. For each respective unspoken textual utterance, the method includes, tokenizing the respective textual utterance into a sequence of sub-word units, generating a first higher order textual feature representation for a corresponding sub-word unit tokenized from the respective unspoken textual utterance, receiving the first higher order textual feature representation generated by a text encoder, and generating a first probability distribution over possible text units. The method also includes training an encoder based on the first probability distribution over possible text units generated by a first-pass decoder for each respective unspoken textual utterance in the set of unspoken textual utterances.

    Backplane for an array of emissive elements

    公开(公告)号:US11847957B2

    公开(公告)日:2023-12-19

    申请号:US17552158

    申请日:2021-12-15

    申请人: GOOGLE LLC

    IPC分类号: G09G3/32 G11C11/412

    摘要: A plurality of pixel drive circuits form part of an array of emissive elements. The plurality of pixel drive circuits are disposed to form a plurality of rows and a plurality of columns. The plurality of pixel drive circuits are organized into sets of pixel drive circuits, and each set comprises at least one pixel drive circuit. A FET of a set of pixel drive circuits shares a common well with other FETs of similar function in the same set of pixel drive circuits positioned therein, such that the variance of the threshold voltages of those FETs is substantially reduced. Each of the pixel drive circuits comprises a circuit operative to deliver a current at a predetermined voltage to an emissive device and a memory circuit operative to receive modulation data and to use same to modulate the current output of the pixel drive circuit.

    Learning word-level confidence for subword end-to-end automatic speech recognition

    公开(公告)号:US11610586B2

    公开(公告)日:2023-03-21

    申请号:US17182592

    申请日:2021-02-23

    申请人: Google LLC

    摘要: A method includes receiving a speech recognition result, and using a confidence estimation module (CEM), for each sub-word unit in a sequence of hypothesized sub-word units for the speech recognition result: obtaining a respective confidence embedding that represents a set of confidence features; generating, using a first attention mechanism, a confidence feature vector; generating, using a second attention mechanism, an acoustic context vector; and generating, as output from an output layer of the CEM, a respective confidence output score for each corresponding sub-word unit based on the confidence feature vector and the acoustic feature vector received as input by the output layer of the CEM. For each of the one or more words formed by the sequence of hypothesized sub-word units, the method also includes determining a respective word-level confidence score for the word. The method also includes determining an utterance-level confidence score by aggregating the word-level confidence scores.

    Larger backplane suitable for high speed applications

    公开(公告)号:US11538431B2

    公开(公告)日:2022-12-27

    申请号:US17354419

    申请日:2021-06-22

    申请人: Google LLC

    发明人: Bo Li Kaushik Sheth

    IPC分类号: G09G3/30 G09G3/36

    摘要: A display system comprising a plurality of display controller circuits controlling a like number of independent segments of pixel drive circuits of a backplane. Each pixel drive circuit comprises a memory element and associated pixel drive circuitry. The segments of the backplane may be organized vertically. The word line for the memory cells of a first segment of pixel drive circuits passes underneath a second segment of pixel drive circuits without directly interacting with the pixel drive circuits of the second segment in order to reach the pixel drive circuits of the first segment. The plurality of display controller circuits operate asynchronously but are kept at the same frame rate by an external signal such as Vsync.

    Joint Endpointing And Automatic Speech Recognition

    公开(公告)号:US20200335091A1

    公开(公告)日:2020-10-22

    申请号:US16809403

    申请日:2020-03-04

    申请人: Google LLC

    摘要: A method includes receiving audio data of an utterance and processing the audio data to obtain, as output from a speech recognition model configured to jointly perform speech decoding and endpointing of utterances: partial speech recognition results for the utterance; and an endpoint indication indicating when the utterance has ended. While processing the audio data, the method also includes detecting, based on the endpoint indication, the end of the utterance. In response to detecting the end of the utterance, the method also includes terminating the processing of any subsequent audio data received after the end of the utterance was detected.

    ADAPTIVE AUDIO ENHANCEMENT FOR MULTICHANNEL SPEECH RECOGNITION

    公开(公告)号:US20180197534A1

    公开(公告)日:2018-07-12

    申请号:US15848829

    申请日:2017-12-20

    申请人: Google LLC

    摘要: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for neural network adaptive beamforming for multichannel speech recognition are disclosed. In one aspect, a method includes the actions of receiving a first channel of audio data corresponding to an utterance and a second channel of audio data corresponding to the utterance. The actions further include generating a first set of filter parameters for a first filter based on the first channel of audio data and the second channel of audio data and a second set of filter parameters for a second filter based on the first channel of audio data and the second channel of audio data. The actions further include generating a single combined channel of audio data. The actions further include inputting the audio data to a neural network. The actions further include providing a transcription for the utterance.

    Universal Monolingual Output Layer for Multilingual Speech Recognition

    公开(公告)号:US20240135923A1

    公开(公告)日:2024-04-25

    申请号:US18485271

    申请日:2023-10-11

    申请人: Google LLC

    摘要: A method includes receiving a sequence of acoustic frames as input to a multilingual automated speech recognition (ASR) model configured to recognize speech in a plurality of different supported languages and generating, by an audio encoder of the multilingual ASR, a higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The method also includes generating, by a language identification (LID) predictor of the multilingual ASR, a language prediction representation for a corresponding higher order feature representation. The method also includes generating, by a decoder of the multilingual ASR, a probability distribution over possible speech recognition results based on the corresponding higher order feature representation, a sequence of non-blank symbols, and a corresponding language prediction representation. The decoder includes monolingual output layer having a plurality of output nodes each sharing a plurality of language-specific wordpiece models.