Patent search ap:("Google LLC") AND inv:"Yuan Cao" Page 1

1.

发明公开
Description-driven Task-oriented Dialogue Modeling 审中-公开

公开(公告)号：US20240220732A1

公开(公告)日：2024-07-04

申请号：US18148045

申请日：2022-12-29

Applicant: Google LLC

Inventor： Raghav Gupta , Yuan Cao , Abhinav Kumar Rastogi , Harrison J. Lee , Jeffrey Liangjie Zhao

IPC: G06F40/35 , G06F16/36

CPC classification number: G06F40/35 , G06F16/367

Abstract: Example methods include determining an input schema representation for a task. The schema representation comprises natural language descriptions of slot and intent descriptions, wherein respective indices are associated with each of the slot descriptions and each of the intent descriptions. The methods include determining a contextual representation comprising a concatenation of a history of dialog sequences exchanged between a user and a service agent, wherein the dialog sequences describe a context for the task. The methods include training, a sequence-to-sequence language model and based on a concatenation of the input schema representation and the contextual representation, to predict a sequence of dialog states for an input task, wherein the sequence of dialog states comprises an assignment of values to slots for which the user has indicated a preference in dialog sequences corresponding to the input task. The methods include providing the trained sequence-to-sequence language model.

2.

发明申请
PREDICTING NEURAL NETWORK PERFORMANCE USING NEURAL NETWORK GAUSSIAN PROCESS 有权

公开(公告)号：US20220019856A1

公开(公告)日：2022-01-20

申请号：US17377142

申请日：2021-07-15

Applicant: Google LLC

Inventor： Jaehoon Lee , Daiyi Peng , Yuan Cao , Jascha Narain Sohl-Dickstein , Daniel Sung-Joon Park

IPC: G06K9/62 , G06N3/08

Abstract: A method for predicting performance of a neural network (NN) is described. The method includes receiving a training data set having a set of training samples; receiving a validation data set having a set of validation pairs; initializing (i) a validation-training kernel matrix representing similarities of the validation inputs in the validation data set and the training inputs in the training data set and (ii) a training-training kernel matrix representing similarities across the training inputs within the training data set; generating a final updated validation-training kernel matrix and a final updated training-training kernel matrix; performing the following operations at least once: generating predicted validation outputs for the validation inputs, and updating an accuracy score of the NN based on the predicted validation outputs and the validation outputs; and outputting the updated accuracy score as a final accuracy score representing performance of the NN.

3.

发明申请
VIDEO-TEXT MODELING WITH ZERO-SHOT TRANSFER FROM CONTRASTIVE CAPTIONERS 有权

公开(公告)号：US20250124708A1

公开(公告)日：2025-04-17

申请号：US18694604

申请日：2023-12-08

Applicant: Google LLC

Inventor： Shen Yan , Tao Zhu , Zirui Wang , Yuan Cao , Jiahui Yu

IPC: G06V20/40 , G06F16/583

Abstract: Provided is an efficient approach to establish a foundational video-text model for tasks including open-vocabulary video classification, text-to-video retrieval, video captioning and video question-answering. Some example implementations include a model which can be referred to as VideoCoCa. Example implementations reuse a pretrained image-text contrastive captioner (CoCa) model and adapt it to video-text tasks with little or minimal extra training. While previous works adapt image-text models with various cross-frame fusion modules (for example, cross-frame attention layer or perceiver resampler) and finetune the modified architecture on video-text data, aspects of the present disclosure leverage findings that the generative attentional pooling and contrastive attentional pooling layers in the image-text CoCa design are instantly adaptable to “flattened frame embeddings”, yielding a strong zero-shot transfer baseline for many video-text tasks.

4.

发明公开
GENERATING LABELED TRAINING DATA USING A PRE-TRAINED LANGUAGE MODEL NEURAL NETWORK 审中-公开

公开(公告)号：US20230196105A1

公开(公告)日：2023-06-22

申请号：US18082934

申请日：2022-12-16

Applicant: Google LLC

Inventor： Zirui Wang , Wei Yu , Orhan Firat , Yuan Cao

IPC: G06N3/08

CPC classification number: G06N3/08

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating labeled training data using a pre-trained language model neural network. In particular, the language model neural network can generate the text input in a new labeled training example from an input sequence that includes (i) one or more context inputs and (ii) a text label that identifies the ground truth category for the new labeled training example.

5.

发明授权
Generating diverse and natural text-to-speech samples 有权

公开(公告)号：US11475874B2

公开(公告)日：2022-10-18

申请号：US17163007

申请日：2021-01-29

Applicant: Google LLC

Inventor： Yu Zhang , Bhuvana Ramabhadran , Andrew Rosenberg , Yonghui Wu , Byungha Chun , Ron Weiss , Yuan Cao

IPC: G10L25/30 , G10L25/00 , G10L17/00 , G10L13/047 , G10L25/18 , G06N3/08 , G10L15/06 , G10L13/10

Abstract: A method of generating diverse and natural text-to-speech (TTS) samples includes receiving a text and generating a speech sample based on the text using a TTS model. A training process trains the TTS model to generate the speech sample by receiving training samples. Each training sample includes a spectrogram and a training text corresponding to the spectrogram. For each training sample, the training process identifies speech units associated with the training text. For each speech unit, the training process generates a speech embedding, aligns the speech embedding with a portion of the spectrogram, extracts a latent feature from the aligned portion of the spectrogram, and assigns a quantized embedding to the latent feature. The training process generates the speech sample by decoding a concatenation of the speech embeddings and a quantized embeddings for the speech units associated with the training text corresponding to the spectrogram.

6.

发明公开
BINARIZED TRANSFORMER NEURAL NETWORKS FOR SEQUENCE GENERATION 审中-公开

公开(公告)号：US20240256966A1

公开(公告)日：2024-08-01

申请号：US18424660

申请日：2024-01-26

Applicant: Google LLC

Inventor： Ankush Garg , Yichi Zhang , Yuan Cao , Lukasz Lew , Orhan Firat , Behrooz Ghorbani

IPC: G06N20/00

CPC classification number: G06N20/00

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for performing sequence generation tasks using binarized neural networks. The binarized neural network is an attention neural network configured to perform the task and the attention neural network includes a plurality of attention blocks, with each block including an attention block and a binarized feedforward block.

7.

发明公开
Demonstration-driven Scalable Task-oriented Dialogue Modeling 审中-公开

公开(公告)号：US20240221731A1

公开(公告)日：2024-07-04

申请号：US18148037

申请日：2022-12-29

Applicant: Google LLC

Inventor： Raghav Gupta , Yuan Cao , Abhinav Kumar Rastogi , Harrison J. Lee , Jeffrey Liangjie Zhao

IPC: G10L15/18 , G06F40/35 , G10L15/06

CPC classification number: G10L15/1815 , G06F40/35 , G10L15/063 , G10L2015/0633

Abstract: Example methods include determining an input prompt comprising an utterance labeled with a sequence of slot-value pairs, wherein the sequence of slot-value pairs indicates possible slots and values in the utterance, and wherein the utterance relates to a task. The methods include determining a contextual representation comprising a concatenation of a history of utterances exchanged between a user and a service agent. The utterances describe a context for the task. The methods include training, based on a concatenation of the input prompt and the contextual representation, a sequence-to-sequence language model to predict a sequence of dialog states for an input task. The sequence of dialog states comprise an assignment of values to slots for which the user has indicated a preference in dialog sequences. The methods include providing the trained sequence-to-sequence language model.

8.

发明公开
Systems and Methods for Pretraining Image Processing Models 审中-公开

公开(公告)号：US20230281400A1

公开(公告)日：2023-09-07

申请号：US17685774

申请日：2022-03-03

Applicant: Google LLC

Inventor： Zirui Wang , Jiahui Yu , Yuan Cao , Wei Yu , Zihang Dai

IPC: G06F40/58 , G06F40/284 , G06V30/10 , G06V10/766

CPC classification number: G06F40/58 , G06F40/284 , G06V10/766 , G06V30/10

Abstract: Example embodiments of the present disclosure relate to systems and methods for pretraining image-processing models on weakly-supervised image-text pairs. The pretraining can include receiving a training sequence for the machine-learned image-processing model. The training sequence can include text tokens and image tokens. A prefix sequence can contain the image tokens. A remainder sequence can include a remainder set of the text tokens. The pretraining can include determining, using the prefix sequence as an input to the machine-learned image-processing model, an objective based on recovery of the remainder sequence. The pretraining can include updating one or more learnable parameters of the machine-learned image-processing model based on the objective.

9.

发明公开
Federated Learning with Partially Trainable Networks 审中-公开

公开(公告)号：US20230214642A1

公开(公告)日：2023-07-06

申请号：US17568933

申请日：2022-01-05

Applicant: Google LLC

Inventor： Hakim Sidahmed , Zheng Xu , Mingqing Chen , Yuan Cao , Ankush Garg

IPC: G06N3/08

CPC classification number: G06N3/08

Abstract: Example aspects of the present disclosure provide a novel, resource-efficient approach for federated machine learning techniques with PTNs. The system can determine a first set of training parameters from a plurality of parameters of the global model. Additionally, the system can generate a random seed, using a random number generator, based on a set of frozen parameters. Moreover, the system can transmit, respectively to a plurality of client computing devices, a first set of training parameters and the random seed. Furthermore, the system can receive, respectively from the plurality of client computing devices, updates to one or more parameters in the first set of training parameters. Subsequently, the system can aggregate the updates to one or more parameters that are respectively received from the plurality of client computing devices. The system can modify one or more global parameters of the global model based on the aggregation.

10.

发明申请
Generating Diverse and Natural Text-To-Speech Samples 有权

公开(公告)号：US20220246132A1

公开(公告)日：2022-08-04

申请号：US17163007

申请日：2021-01-29

Applicant: Google LLC

Inventor： Yu Zhang , Bhuvana Ramabhadran , Andrew Rosenberg , Yonghui Wu , Byungha Chun , Ron Weiss , Yuan Cao

IPC: G10L13/047 , G10L25/18 , G10L13/10 , G10L15/06 , G06N3/08

Abstract: A method of generating diverse and natural text-to-speech (TTS) samples includes receiving a text and generating a speech sample based on the text using a TTS model. A training process trains the TTS model to generate the speech sample by receiving training samples. Each training sample includes a spectrogram and a training text corresponding to the spectrogram. For each training sample, the training process identifies speech units associated with the training text. For each speech unit, the training process generates a speech embedding, aligns the speech embedding with a portion of the spectrogram, extracts a latent feature from the aligned portion of the spectrogram, and assigns a quantized embedding to the latent feature. The training process generates the speech sample by decoding a concatenation of the speech embeddings and a quantized embeddings for the speech units associated with the training text corresponding to the spectrogram.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification