Patent search ap:("Google LLC") AND inv:"Wei Han" Page 1

1.

发明公开
Convolution-Augmented Transformer Models 审中-公开

公开(公告)号：US20240362453A1

公开(公告)日：2024-10-31

申请号：US18766038

申请日：2024-07-08

Applicant: Google LLC

Inventor： Anmol Gulati , Weikeng Qin , Zhengdong Zhang , Ruoming Pang , Niki Parmar , Jiahui Yu , Wei Han , Chung-Cheng Chiu , Yu Zhang , Yonghui Wu , Shibo Wang

IPC: G06N3/04 , G06N20/00 , G10L15/16

CPC classification number: G06N3/04 , G06N20/00 , G10L15/16

Abstract: Systems and methods can utilize a conformer model to process a data set for various data processing tasks, including, but not limited to, speech recognition, sound separation, protein synthesis determination, video or other image set analysis, and natural language processing. The conformer model can use feed-forward blocks, a self-attention block, and a convolution block to process data to learn global interactions and relative-offset-based local correlations of the input data.

2.

发明授权
Convolution-augmented transformer models 有权

公开(公告)号：US12079703B2

公开(公告)日：2024-09-03

申请号：US17139525

申请日：2020-12-31

Applicant: Google LLC

Inventor： Anmol Gulati , Ruoming Pang , Niki Parmar , Jiahui Yu , Wei Han , Chung-Cheng Chiu , Yu Zhang , Yonghui Wu , Shibo Wang , Weikeng Qin , Zhengdong Zhang

IPC: G06N3/04 , G06N20/00 , G10L15/16

CPC classification number: G06N3/04 , G06N20/00 , G10L15/16

Abstract: Systems and methods can utilize a conformer model to process a data set for various data processing tasks, including, but not limited to, speech recognition, sound separation, protein synthesis determination, video or other image set analysis, and natural language processing. The conformer model can use feed-forward blocks, a self-attention block, and a convolution block to process data to learn global interactions and relative-offset-based local correlations of the input data.

3.

发明公开
Diffusion Models for Generation of Audio Data Based on Descriptive Textual Prompts 审中-公开

公开(公告)号：US20240282294A1

公开(公告)日：2024-08-22

申请号：US18651296

申请日：2024-04-30

Applicant: Google LLC

Inventor： Qingqing Huang , Daniel Sung-Joon Park , Aren Jansen , Timo Immanuel Denk , Yue Li , Ravi Ganti , Dan Ellis , Tao Wang , Wei Han , Joonseok Lee

IPC: G10L15/06 , G10L15/16

CPC classification number: G10L15/063 , G10L15/16

Abstract: A corpus of textual data is generated with a machine-learned text generation model. The corpus of textual data includes a plurality of sentences. Each sentence is descriptive of a type of audio. For each of a plurality of audio recordings, the audio recording is processed with a machine-learned audio classification model to obtain training data including the audio recording and one or more sentences of the plurality of sentences closest to the audio recording within a joint audio-text embedding space of the machine-learned audio classification model. The sentence(s) are processed with a machine-learned generation model to obtain an intermediate representation of the one or more sentences. The intermediate representation is processed with a machine-learned cascaded diffusion model to obtain audio data. The machine-learned cascaded diffusion model is trained based on a difference between the audio data and the audio recording.

4.

发明申请
STREAMING OBJECT DETECTION WITHIN SENSOR DATA 有权

公开(公告)号：US20220415042A1

公开(公告)日：2022-12-29

申请号：US17901224

申请日：2022-09-01

Applicant: Google LLC

Inventor： Jonathon Shlens , Vijay Vasudevan , Jiquan Ngiam , Wei Han , Zhifeng Chen , Brandon Chauloon Yang , Benjamin James Caine , Zhengdong Zhang , Christoph Sprunk , Ouais Alsharif , Junhua Mao , Chen Wu

IPC: G06V20/10 , G06T17/00 , G01S17/89 , G06N3/04

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for processing data generated by a sensing system that rotationally senses an environment. In one aspect, a method comprises partitioning a predetermined period of time into a plurality of sub-periods, wherein the predetermined period of time is a period of time for which data generated by the sensing system constitutes a complete rotational sensing of the environment; for each sub-period: receiving current data generated by the sensing system during the sub-period and characterizing a respective partial scene of the environment; processing the current data using an object detection neural network to generate a current object detection output that is specific to the respective partial scene of the environment.

5.

发明申请
Co-Training of Action Recognition Machine Learning Models 有权

公开(公告)号：US20250037426A1

公开(公告)日：2025-01-30

申请号：US18716912

申请日：2022-12-09

Applicant: Google LLC

Inventor： Bowen Zhang , Jiahui Yu , Christopher Fifty , Wei Han , Andrew M. Dai , Ruoming Pang , Fei Sha

IPC: G06V10/764 , G06V10/774

Abstract: A method includes obtaining video datasets each including pairs of a training video and a ground-truth action classification of the training video. The method also includes generating an action recognition model that includes a shared encoder model and action classification heads. A number of the action classifications heads may be equal to a number of the video datasets, and each action classification head may be configured to, based on an output of the shared encoder model, classify training videos sampled from a corresponding video dataset. The method also includes determining, by the action recognition model and for each training video sampled from the video datasets, an inferred action classification. The method further includes determining a loss value based on the inferred action classifications and the ground-truth action classifications, and adjusting parameters of the action recognition model based on the loss value.

6.

发明申请
Fast Emit Low-latency Streaming ASR with Sequence-level Emission Regularization 有权

公开(公告)号：US20220122586A1

公开(公告)日：2022-04-21

申请号：US17447285

申请日：2021-09-09

Applicant: Google LLC

Inventor： Jiahui Yu , Chung-cheng Chiu , Bo Li , Shuo-yiin Chang , Tara Sainath , Wei Han , Anmol Gulati , Yanzhang He , Arun Narayanan , Yonghui Wu , Ruoming Pang

IPC: G10L15/06 , G10L15/22 , G10L15/30 , G10L15/16

Abstract: A computer-implemented method of training a streaming speech recognition model that includes receiving, as input to the streaming speech recognition model, a sequence of acoustic frames. The streaming speech recognition model is configured to learn an alignment probability between the sequence of acoustic frames and an output sequence of vocabulary tokens. The vocabulary tokens include a plurality of label tokens and a blank token. At each output step, the method includes determining a first probability of emitting one of the label tokens and determining a second probability of emitting the blank token. The method also includes generating the alignment probability at a sequence level based on the first probability and the second probability. The method also includes applying a tuning parameter to the alignment probability at the sequence level to maximize the first probability of emitting one of the label tokens.

7.

发明公开
Contrastive Learning and Masked Modeling for End-To-End Self-Supervised Pre-Training 审中-公开

公开(公告)号：US20240104352A1

公开(公告)日：2024-03-28

申请号：US18012391

申请日：2022-07-28

Applicant: Google LLC

Inventor： Yu Zhang , Yu-An Chung , Wei Han , Chung-Cheng Chiu , Weikeng Qin , Ruoming Pang , Yonghui Wu

IPC: G06N3/0455

CPC classification number: G06N3/0455

Abstract: Provided are improved end-to-end self-supervised pre-training frameworks that leverage a combination of contrastive and masked modeling loss terms. In particular, the present disclosure provides framework that combines contrastive learning and masked modeling, where the former trains the model to discretize input data (e.g., continuous signals such as continuous speech signals) into a finite set of discriminative tokens, and the latter trains the model to learn contextualized representations via solving a masked prediction task consuming the discretized tokens. In contrast to certain existing masked modeling-based pre-training frameworks which rely on an iterative re-clustering and re-training process or other existing frameworks which concatenate two separately trained modules, the proposed framework can enable a model to be optimized in an end-to-end fashion by solving the two self-supervised tasks (the contrastive task and masked modeling) simultaneously.

8.

发明授权
Streaming automatic speech recognition with non-streaming model distillation 有权

公开(公告)号：US11804212B2

公开(公告)日：2023-10-31

申请号：US17348118

申请日：2021-06-15

Applicant: Google LLC

Inventor： Thibault Doutre , Wei Han , Min Ma , Zhiyun Lu , Chung-Cheng Chiu , Ruoming Pang , Arun Narayanan , Ananya Misra , Yu Zhang , Liangliang Cao

IPC: G10L15/06 , G10L15/08 , G10L15/18 , G06N3/04 , G06N3/045

CPC classification number: G10L15/063 , G06N3/045 , G10L15/083 , G10L15/18

Abstract: A method for training a streaming automatic speech recognition student model includes receiving a plurality of unlabeled student training utterances. The method also includes, for each unlabeled student training utterance, generating a transcription corresponding to the respective unlabeled student training utterance using a plurality of non-streaming automated speech recognition (ASR) teacher models. The method further includes distilling a streaming ASR student model from the plurality of non-streaming ASR teacher models by training the streaming ASR student model using the plurality of unlabeled student training utterances paired with the corresponding transcriptions generated by the plurality of non-streaming ASR teacher models.

9.

发明授权
Streaming object detection within sensor data 有权

公开(公告)号：US11774596B2

公开(公告)日：2023-10-03

申请号：US17901224

申请日：2022-09-01

Applicant: Google LLC

Inventor： Jonathon Shlens , Vijay Vasudevan , Jiquan Ngiam , Wei Han , Zhifeng Chen , Brandon Chauloon Yang , Benjamin James Caine , Zhengdong Zhang , Christoph Sprunk , Ouais Alsharif , Junhua Mao , Chen Wu

IPC: G01S17/89 , G06T17/00 , G06V20/10 , G06N3/048 , G06V10/764 , G06V20/56

CPC classification number: G01S17/89 , G06N3/048 , G06T17/00 , G06V10/764 , G06V20/10 , G06V20/56

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for processing data generated by a sensing system that rotationally senses an environment. In one aspect, a method comprises partitioning a predetermined period of time into a plurality of sub-periods, wherein the predetermined period of time is a period of time for which data generated by the sensing system constitutes a complete rotational sensing of the environment; for each sub-period: receiving current data generated by the sensing system during the sub-period and characterizing a respective partial scene of the environment; processing the current data using an object detection neural network to generate a current object detection output that is specific to the respective partial scene of the environment.

10.

发明公开
Unsupervised Data Selection via Discrete Speech Representation for Automatic Speech Recognition 审中-公开

公开(公告)号：US20240013777A1

公开(公告)日：2024-01-11

申请号：US18320458

申请日：2023-05-19

Applicant: Google LLC

Inventor： Zhiyun Lu , Yu Zhang , Wei Han , Yongqiang Wang , Parisa Haghani , Zhehuai Chen

IPC: G10L15/16 , G10L15/06

CPC classification number: G10L15/16 , G10L15/063

Abstract: A method includes obtaining a corpus of unlabeled training data including a plurality of spoken utterances, each corresponding spoken utterance of the plurality of spoken utterances includes audio data characterizing the corresponding spoken utterance. The method also includes receiving a target domain. The method also includes selecting, using a contrastive data selection model, a subset of the utterances from the corpus of unlabeled training data that correspond to the target domain. The method includes training an automatic speech recognition (ASR) model on the subset of utterances.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification