专利检索 ap:("GOOGLE LLC") AND inv:"Ruoming Pang" 第 1 页

1.

发明公开
Emitting Word Timings with End-to-End Models 审中-公开

公开(公告)号：US20240321263A1

公开(公告)日：2024-09-26

申请号：US18680797

申请日：2024-05-31

申请人： Google LLC

发明人： Tara N. Sainath , Basilio Garcia Castillo , David Rybach , Trevor Strohman , Ruoming Pang

IPC分类号： G10L15/06 , G10L25/30 , G10L25/78

CPC分类号： G10L15/063 , G10L25/30 , G10L25/78

摘要： A method includes receiving a training example that includes audio data representing a spoken utterance and a ground truth transcription. For each word in the spoken utterance, the method also includes inserting a placeholder symbol before the respective word identifying a respective ground truth alignment for a beginning and an end of the respective word, determining a beginning word piece and an ending word piece, and generating a first constrained alignment for the beginning word piece and a second constrained alignment for the ending word piece. The first constrained alignment is aligned with the ground truth alignment for the beginning of the respective word and the second constrained alignment is aligned with the ground truth alignment for the ending of the respective word. The method also includes constraining an attention head of a second pass decoder by applying the first and second constrained alignments.

2.

发明授权
Fast emit low-latency streaming ASR with sequence-level emission regularization utilizing forward and backward probabilities between nodes of an alignment lattice 有权

公开(公告)号：US12094453B2

公开(公告)日：2024-09-17

申请号：US17447285

申请日：2021-09-09

申请人： Google LLC

发明人： Jiahui Yu , Chung-cheng Chiu , Bo Li , Shuo-yiin Chang , Tara Sainath , Wei Han , Anmol Gulati , Yanzhang He , Arun Narayanan , Yonghui Wu , Ruoming Pang

IPC分类号： G10L15/06 , G10L15/16 , G10L15/187 , G10L15/22 , G10L15/30

CPC分类号： G10L15/063 , G10L15/16 , G10L15/22 , G10L15/30 , G10L15/187

摘要： A computer-implemented method of training a streaming speech recognition model that includes receiving, as input to the streaming speech recognition model, a sequence of acoustic frames. The streaming speech recognition model is configured to learn an alignment probability between the sequence of acoustic frames and an output sequence of vocabulary tokens. The vocabulary tokens include a plurality of label tokens and a blank token. At each output step, the method includes determining a first probability of emitting one of the label tokens and determining a second probability of emitting the blank token. The method also includes generating the alignment probability at a sequence level based on the first probability and the second probability. The method also includes applying a tuning parameter to the alignment probability at the sequence level to maximize the first probability of emitting one of the label tokens.

3.

发明授权
Efficient streaming non-recurrent on-device end-to-end model 有权

公开(公告)号：US12051404B2

公开(公告)日：2024-07-30

申请号：US18336211

申请日：2023-06-16

申请人： Google LLC

发明人： Tara Sainath , Arun Narayanan , Rami Botros , Yanzhang He , Ehsan Variani , Cyril Allauzen , David Rybach , Ruoming Pang , Trevor Strohman

IPC分类号： G10L15/00 , G10L15/02 , G10L15/06 , G10L15/22 , G10L15/30

CPC分类号： G10L15/063 , G10L15/02 , G10L15/22 , G10L15/30

摘要： An ASR model includes a first encoder configured to receive a sequence of acoustic frames and generate a first higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The ASR model also includes a second encoder configured to receive the first higher order feature representation generated by the first encoder at each of the plurality of output steps and generate a second higher order feature representation for a corresponding first higher order feature frame. The ASR model also includes a decoder configured to receive the second higher order feature representation generated by the second encoder at each of the plurality of output steps and generate a first probability distribution over possible speech recognition hypothesis. The ASR model also includes a language model configured to receive the first probability distribution over possible speech hypothesis and generate a rescored probability distribution.

4.

发明公开
Streaming Automatic Speech Recognition With Non-Streaming Model Distillation 审中-公开

公开(公告)号：US20240029716A1

公开(公告)日：2024-01-25

申请号：US18480827

申请日：2023-10-04

申请人： Google LLC

发明人： Thibault Doutre , Wei Han , Min Ma , Zhiyun Lu , Chung-Cheng Chiu , Ruoming Pang , Arun Narayanan , Ananya Misra , Yu Zhang , Liangliang Cao

IPC分类号： G10L15/06 , G10L15/08 , G10L15/18 , G06N3/045

CPC分类号： G10L15/063 , G10L15/083 , G10L15/18 , G06N3/045

摘要： A method for training a streaming automatic speech recognition student model includes receiving a plurality of unlabeled student training utterances. The method also includes, for each unlabeled student training utterance, generating a transcription corresponding to the respective unlabeled student training utterance using a plurality of non-streaming automated speech recognition (ASR) teacher models. The method further includes distilling a streaming ASR student model from the plurality of non-streaming ASR teacher models by training the streaming ASR student model using the plurality of unlabeled student training utterances paired with the corresponding transcriptions generated by the plurality of non-streaming ASR teacher models.

5.

发明公开
Systems and Methods for Training Dual-Mode Machine-Learned Speech Recognition Models 审中-公开

公开(公告)号：US20230237993A1

公开(公告)日：2023-07-27

申请号：US18011571

申请日：2021-10-01

申请人： Google LLC

发明人： Jiahui Yu , Ruoming Pang , Wei Han , Anmol Gulati , Chung-Cheng Chiu , Bo Li , Tara N. Sainath , Yonghui Hu

IPC分类号： G10L15/16 , G10L15/32 , G10L15/22

CPC分类号： G10L15/16 , G10L15/32 , G10L15/22

摘要： Systems and methods of the present disclosure are directed to a computing system, including one or more processors and a machine-learned multi-mode speech recognition model configured to operate in a streaming recognition mode or a contextual recognition mode. The computing system can perform operations including obtaining speech data and a ground truth label and processing the speech data using the contextual recognition mode to obtain contextual prediction data. The operations can include evaluating a difference between the contextual prediction data and the ground truth label and processing the speech data using the streaming recognition mode to obtain streaming prediction data. The operations can include evaluating a difference between the streaming prediction data and the ground truth label and the contextual and streaming prediction data. The operations can include adjusting parameters of the speech recognition model.

6.

发明公开
Emitting Word Timings with End-to-End Models 审中-公开

公开(公告)号：US20230206907A1

公开(公告)日：2023-06-29

申请号：US18167050

申请日：2023-02-09

申请人： Google LLC

发明人： Tara N Sainath , Basilio Garcia Castillo , David Rybach , Trevor Strohman , Ruoming Pang

IPC分类号： G10L15/06 , G10L25/30 , G10L25/78

CPC分类号： G10L15/063 , G10L25/30 , G10L25/78

摘要： A method includes receiving a training example that includes audio data representing a spoken utterance and a ground truth transcription. For each word in the spoken utterance, the method also includes inserting a placeholder symbol before the respective word identifying a respective ground truth alignment for a beginning and an end of the respective word, determining a beginning word piece and an ending word piece, and generating a first constrained alignment for the beginning word piece and a second constrained alignment for the ending word piece. The first constrained alignment is aligned with the ground truth alignment for the beginning of the respective word and the second constrained alignment is aligned with the ground truth alignment for the ending of the respective word. The method also includes constraining an attention head of a second pass decoder by applying the first and second constrained alignments.

7.

发明申请
Convolution-Augmented Transformer Models 有权

公开(公告)号：US20220207321A1

公开(公告)日：2022-06-30

申请号：US17139525

申请日：2020-12-31

申请人： Google LLC

发明人： Anmol Gulati , Ruoming Pang , Niki Parmar , Jiahui Yu , Wei Han , Chung-Cheng Chiu , Yu Zhang , Yonghui Wu , Shibo Wang , Weikeng Qin , Zhengdong Zhang

IPC分类号： G06N3/04 , G10L15/16 , G06N20/00

摘要： Systems and methods can utilize a conformer model to process a data set for various data processing tasks, including, but not limited to, speech recognition, sound separation, protein synthesis determination, video or other image set analysis, and natural language processing. The conformer model can use feed-forward blocks, a self-attention block, and a convolution block to process data to learn global interactions and relative-offset-based local correlations of the input data.

8.

发明申请
Neural Architecture Search with Factorized Hierarchical Search Space 有权

公开(公告)号：US20220101090A1

公开(公告)日：2022-03-31

申请号：US17495398

申请日：2021-10-06

申请人： Google LLC

发明人： Mingxing Tan , Quoc V. Le , Bo Chen , Vijay Vasudevan , Ruoming Pang

IPC分类号： G06N3/04 , G06N20/10 , G06F17/15 , G06N3/08

摘要： The present disclosure is directed to an automated neural architecture search approach for designing new neural network architectures such as, for example, resource-constrained mobile CNN models. In particular, the present disclosure provides systems and methods to perform neural architecture search using a novel factorized hierarchical search space that permits layer diversity throughout the network, thereby striking the right balance between flexibility and search space size. The resulting neural architectures are able to be run relatively faster and using relatively fewer computing resources (e.g., less processing power, less memory usage, less power consumption, etc.), all while remaining competitive with or even exceeding the performance (e.g., accuracy) of current state-of-the-art mobile-optimized models.

9.

发明申请
SYNTHESIZING SPEECH FROM TEXT USING NEURAL NETWORKS 有权

公开(公告)号：US20210295858A1

公开(公告)日：2021-09-23

申请号：US17222736

申请日：2021-04-05

申请人： Google LLC

发明人： Yonghui Wu , Jonathan Shen , Ruoming Pang , Ron J. Weiss , Michael Schuster , Navdeep Jaitly , Zongheng Yang , Zhifeng Chen , Yu Zhang , Yuxuan Wang , Russell John Wyatt Skerry-Ryan , Ryan M. Rifkin , Ioannis Agiomyrgiannakis

IPC分类号： G10L25/30 , G10L13/047 , G10L13/08 , G06N7/00 , G06N3/08 , G06N3/04 , G06N5/04 , G10L25/18

摘要： Methods, systems, and computer program products for generating, from an input character sequence, an output sequence of audio data representing the input character sequence. The output sequence of audio data includes a respective audio output sample for each of a number of time steps. One example method includes, for each of the time steps: generating a mel-frequency spectrogram for the time step by processing a representation of a respective portion of the input character sequence using a decoder neural network; generating a probability distribution over a plurality of possible audio output samples for the time step by processing the mel-frequency spectrogram for the time step using a vocoder neural network; and selecting the audio output sample for the time step from the possible audio output samples in accordance with the probability distribution.

10.

发明授权
Asynchronous distributed data flow for machine learning workloads 有权

公开(公告)号：US12112198B2

公开(公告)日：2024-10-08

申请号：US18082415

申请日：2022-12-15

申请人： Google LLC

发明人： Jeffrey Adgate Dean , Sudip Roy , Michael Acheson Isard , Aakanksha Chowdhery , Brennan Saeta , Chandramohan Amyangot Thekkath , Daniel William Hurt , Hyeontaek Lim , Laurent El Shafey , Parker Edward Schuh , Paul Ronald Barham , Ruoming Pang , Ryan Sepassi , Sanjay Ghemawat , Yonghui Wu

IPC分类号： G06F17/10 , G06F9/48 , G06N3/063 , G06N3/08

CPC分类号： G06F9/4881 , G06N3/063 , G06N3/08

摘要： Methods, systems, and apparatus, including computer programs encoded on computer storage media, for distributing machine learning workloads, e.g., computations for training a neural network or computing an inference using a neural network, across multiple hardware accelerators. One of the systems comprises a plurality of accelerator islands, each hardware accelerator island comprising a respective plurality of hardware devices that include a plurality of hardware accelerators and a corresponding host for each of the plurality of hardware accelerators; and a respective scheduler for each of the accelerator islands that is configured to schedule workloads across the plurality of accelerators and corresponding hosts in the accelerator island, wherein the system is configured to: receive data representing a machine learning workload; and assign a respective portion of the machine learning workload to each of the plurality of accelerator islands for scheduling by the respective scheduler for the accelerator island.

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类