专利检索 ap:("Google LLC") AND inv:"Bo Li" 第 1 页

1.

发明公开
Joint Speech and Text Streaming Model for ASR 审中-公开

公开(公告)号：US20240028829A1

公开(公告)日：2024-01-25

申请号：US18346232

申请日：2023-07-01

申请人： Google LLC

发明人： Tara N. Sainath , Zhouyuan Huo , Zhehuai Chen , Yu Zhang , Weiran Wang , Trevor Strohman , Rohit Prakash Prabhavalkar , Bo Li , Ankur Bapna

IPC分类号： G06F40/284 , G06F40/40

CPC分类号： G06F40/284 , G06F40/40

摘要： A method includes receiving training data that includes a set of unspoken textual utterances. For each respective unspoken textual utterance, the method includes, tokenizing the respective textual utterance into a sequence of sub-word units, generating a first higher order textual feature representation for a corresponding sub-word unit tokenized from the respective unspoken textual utterance, receiving the first higher order textual feature representation generated by a text encoder, and generating a first probability distribution over possible text units. The method also includes training an encoder based on the first probability distribution over possible text units generated by a first-pass decoder for each respective unspoken textual utterance in the set of unspoken textual utterances.

2.

发明授权
Backplane for an array of emissive elements 有权

公开(公告)号：US11847957B2

公开(公告)日：2023-12-19

申请号：US17552158

申请日：2021-12-15

申请人： GOOGLE LLC

发明人： Edwin Lyle Hudson , Bo Li

IPC分类号： G09G3/32 , G11C11/412

CPC分类号： G09G3/32 , G11C11/412 , G09G2300/0842 , G09G2310/0297

摘要： A plurality of pixel drive circuits form part of an array of emissive elements. The plurality of pixel drive circuits are disposed to form a plurality of rows and a plurality of columns. The plurality of pixel drive circuits are organized into sets of pixel drive circuits, and each set comprises at least one pixel drive circuit. A FET of a set of pixel drive circuits shares a common well with other FETs of similar function in the same set of pixel drive circuits positioned therein, such that the variance of the threshold voltages of those FETs is substantially reduced. Each of the pixel drive circuits comprises a circuit operative to deliver a current at a predetermined voltage to an emissive device and a memory circuit operative to receive modulation data and to use same to modulate the current output of the pixel drive circuit.

3.

发明授权
Adaptive audio enhancement for multichannel speech recognition 有权

公开(公告)号：US11756534B2

公开(公告)日：2023-09-12

申请号：US17649058

申请日：2022-01-26

申请人： Google LLC

发明人： Bo Li , Ron Weiss , Michiel A. U. Bacchiani , Tara N. Sainath , Kevin William Wilson

IPC分类号： G10L15/00 , G10L15/16 , G10L15/20 , G10L21/0224 , G10L15/26 , G10L21/0216

CPC分类号： G10L15/16 , G10L15/20 , G10L21/0224 , G10L15/26 , G10L2021/02166

摘要： Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for neural network adaptive beamforming for multichannel speech recognition are disclosed. In one aspect, a method includes the actions of receiving a first channel of audio data corresponding to an utterance and a second channel of audio data corresponding to the utterance. The actions further include generating a first set of filter parameters for a first filter based on the first channel of audio data and the second channel of audio data and a second set of filter parameters for a second filter based on the first channel of audio data and the second channel of audio data. The actions further include generating a single combined channel of audio data. The actions further include inputting the audio data to a neural network. The actions further include providing a transcription for the utterance.

4.

发明授权
Learning word-level confidence for subword end-to-end automatic speech recognition 有权

公开(公告)号：US11610586B2

公开(公告)日：2023-03-21

申请号：US17182592

申请日：2021-02-23

申请人： Google LLC

发明人： David Qiu , Qiujia Li , Yanzhang He , Yu Zhang , Bo Li , Liangliang Cao , Rohit Prabhavalkar , Deepti Bhatia , Wei Li , Ke Hu , Tara Sainath , Ian Mcgraw

IPC分类号： G10L15/22 , G10L15/08 , G06N3/08 , G10L25/30

摘要： A method includes receiving a speech recognition result, and using a confidence estimation module (CEM), for each sub-word unit in a sequence of hypothesized sub-word units for the speech recognition result: obtaining a respective confidence embedding that represents a set of confidence features; generating, using a first attention mechanism, a confidence feature vector; generating, using a second attention mechanism, an acoustic context vector; and generating, as output from an output layer of the CEM, a respective confidence output score for each corresponding sub-word unit based on the confidence feature vector and the acoustic feature vector received as input by the output layer of the CEM. For each of the one or more words formed by the sequence of hypothesized sub-word units, the method also includes determining a respective word-level confidence score for the word. The method also includes determining an utterance-level confidence score by aggregating the word-level confidence scores.

5.

发明授权
Larger backplane suitable for high speed applications 有权

公开(公告)号：US11538431B2

公开(公告)日：2022-12-27

申请号：US17354419

申请日：2021-06-22

申请人： Google LLC

发明人： Bo Li , Kaushik Sheth

IPC分类号： G09G3/30 , G09G3/36

摘要： A display system comprising a plurality of display controller circuits controlling a like number of independent segments of pixel drive circuits of a backplane. Each pixel drive circuit comprises a memory element and associated pixel drive circuitry. The segments of the backplane may be organized vertically. The word line for the memory cells of a first segment of pixel drive circuits passes underneath a second segment of pixel drive circuits without directly interacting with the pixel drive circuits of the second segment in order to reach the pixel drive circuits of the first segment. The plurality of display controller circuits operate asynchronously but are kept at the same frame rate by an external signal such as Vsync.

6.

发明申请
Joint Endpointing And Automatic Speech Recognition 审中-公开

公开(公告)号：US20200335091A1

公开(公告)日：2020-10-22

申请号：US16809403

申请日：2020-03-04

申请人： Google LLC

发明人： Shuo-yiin Chang , Rohit Prakash Prabhavalkar , Gabor Simko , Tara N. Sainath , Bo Li , Yangzhang He

IPC分类号： G10L15/16 , G10L15/14 , G10L15/28 , G10L15/02

摘要： A method includes receiving audio data of an utterance and processing the audio data to obtain, as output from a speech recognition model configured to jointly perform speech decoding and endpointing of utterances: partial speech recognition results for the utterance; and an endpoint indication indicating when the utterance has ended. While processing the audio data, the method also includes detecting, based on the endpoint indication, the end of the utterance. In response to detecting the end of the utterance, the method also includes terminating the processing of any subsequent audio data received after the end of the utterance was detected.

7.

发明申请
SPEECH RECOGNITION WITH SEQUENCE-TO-SEQUENCE MODELS 审中-公开

公开(公告)号：US20200027444A1

公开(公告)日：2020-01-23

申请号：US16516390

申请日：2019-07-19

申请人： Google LLC

发明人： Rohit Prakash Prabhavalkar , Zhifeng Chen , Bo Li , Chung-Cheng Chiu , Kanury Kanishka Rao , Yonghui Wu , Ron J. Weiss , Navdeep Jaitly , Michiel A.U. Bacchiani , Tara N. Sainath , Jan Kazimierz Chorowski , Anjuli Patricia Kannan , Ekaterina Gonina , Patrick An Phu Nguyen

IPC分类号： G10L15/16 , G10L15/22 , G10L15/06 , G10L15/02 , G06N3/08

摘要： Methods, systems, and apparatus, including computer-readable media, for performing speech recognition using sequence-to-sequence models. An automated speech recognition (ASR) system receives audio data for an utterance and provides features indicative of acoustic characteristics of the utterance as input to an encoder. The system processes an output of the encoder using an attender to generate a context vector and generates speech recognition scores using the context vector and a decoder trained using a training process that selects at least one input to the decoder with a predetermined probability. An input to the decoder during training is selected between input data based on a known value for an element in a training example, and input data based on an output of the decoder for the element in the training example. A transcription is generated for the utterance using word elements selected based on the speech recognition scores. The transcription is provided as an output of the ASR system.

8.

发明申请
ADAPTIVE AUDIO ENHANCEMENT FOR MULTICHANNEL SPEECH RECOGNITION 审中-公开

公开(公告)号：US20180197534A1

公开(公告)日：2018-07-12

申请号：US15848829

申请日：2017-12-20

申请人： Google LLC

发明人： Bo Li , Ron J. Weiss , Michiel A.U. Bacchiani , Tara N. Sainath , Kevin William Wilson

IPC分类号： G10L15/16 , G10L21/0224 , G10L15/26 , G10L21/0216

摘要： Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for neural network adaptive beamforming for multichannel speech recognition are disclosed. In one aspect, a method includes the actions of receiving a first channel of audio data corresponding to an utterance and a second channel of audio data corresponding to the utterance. The actions further include generating a first set of filter parameters for a first filter based on the first channel of audio data and the second channel of audio data and a second set of filter parameters for a second filter based on the first channel of audio data and the second channel of audio data. The actions further include generating a single combined channel of audio data. The actions further include inputting the audio data to a neural network. The actions further include providing a transcription for the utterance.

9.

发明授权
Speech recognition with sequence-to-sequence models 有权

公开(公告)号：US12106749B2

公开(公告)日：2024-10-01

申请号：US17448119

申请日：2021-09-20

申请人： Google LLC

发明人： Rohit Prakash Prabhavalkar , Zhifeng Chen , Bo Li , Chung-cheng Chiu , Kanury Kanishka Rao , Yonghui Wu , Ron J. Weiss , Navdeep Jaitly , Michiel A. u. Bacchiani , Tara N. Sainath , Jan Kazimierz Chorowski , Anjuli Patricia Kannan , Ekaterina Gonina , Patrick An Phu Nguyen

IPC分类号： G10L15/00 , G06N3/08 , G10L15/02 , G10L15/06 , G10L15/16 , G10L15/22 , G10L25/30 , G10L15/26

CPC分类号： G10L15/16 , G06N3/08 , G10L15/02 , G10L15/063 , G10L15/22 , G10L25/30 , G10L2015/025 , G10L15/26

摘要： A method for performing speech recognition using sequence-to-sequence models includes receiving audio data for an utterance and providing features indicative of acoustic characteristics of the utterance as input to an encoder. The method also includes processing an output of the encoder using an attender to generate a context vector, generating speech recognition scores using the context vector and a decoder trained using a training process, and generating a transcription for the utterance using word elements selected based on the speech recognition scores. The transcription is provided as an output of the ASR system.

10.

发明公开
Universal Monolingual Output Layer for Multilingual Speech Recognition 审中-公开

公开(公告)号：US20240135923A1

公开(公告)日：2024-04-25

申请号：US18485271

申请日：2023-10-11

申请人： Google LLC

发明人： Chao Zhang , Bo Li , Tara N. Sainath , Trevor Strohman , Shuo-yiin Chang

IPC分类号： G10L15/197 , G10L15/00 , G10L15/02

CPC分类号： G10L15/197 , G10L15/005 , G10L15/02

摘要： A method includes receiving a sequence of acoustic frames as input to a multilingual automated speech recognition (ASR) model configured to recognize speech in a plurality of different supported languages and generating, by an audio encoder of the multilingual ASR, a higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The method also includes generating, by a language identification (LID) predictor of the multilingual ASR, a language prediction representation for a corresponding higher order feature representation. The method also includes generating, by a decoder of the multilingual ASR, a probability distribution over possible speech recognition results based on the corresponding higher order feature representation, a sequence of non-blank symbols, and a corresponding language prediction representation. The decoder includes monolingual output layer having a plurality of output nodes each sharing a plurality of language-specific wordpiece models.

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类