Description-driven Task-oriented Dialogue Modeling

    公开(公告)号:US20240220732A1

    公开(公告)日:2024-07-04

    申请号:US18148045

    申请日:2022-12-29

    Applicant: Google LLC

    CPC classification number: G06F40/35 G06F16/367

    Abstract: Example methods include determining an input schema representation for a task. The schema representation comprises natural language descriptions of slot and intent descriptions, wherein respective indices are associated with each of the slot descriptions and each of the intent descriptions. The methods include determining a contextual representation comprising a concatenation of a history of dialog sequences exchanged between a user and a service agent, wherein the dialog sequences describe a context for the task. The methods include training, a sequence-to-sequence language model and based on a concatenation of the input schema representation and the contextual representation, to predict a sequence of dialog states for an input task, wherein the sequence of dialog states comprises an assignment of values to slots for which the user has indicated a preference in dialog sequences corresponding to the input task. The methods include providing the trained sequence-to-sequence language model.

    PREDICTING NEURAL NETWORK PERFORMANCE USING NEURAL NETWORK GAUSSIAN PROCESS

    公开(公告)号:US20220019856A1

    公开(公告)日:2022-01-20

    申请号:US17377142

    申请日:2021-07-15

    Applicant: Google LLC

    Abstract: A method for predicting performance of a neural network (NN) is described. The method includes receiving a training data set having a set of training samples; receiving a validation data set having a set of validation pairs; initializing (i) a validation-training kernel matrix representing similarities of the validation inputs in the validation data set and the training inputs in the training data set and (ii) a training-training kernel matrix representing similarities across the training inputs within the training data set; generating a final updated validation-training kernel matrix and a final updated training-training kernel matrix; performing the following operations at least once: generating predicted validation outputs for the validation inputs, and updating an accuracy score of the NN based on the predicted validation outputs and the validation outputs; and outputting the updated accuracy score as a final accuracy score representing performance of the NN.

    VIDEO-TEXT MODELING WITH ZERO-SHOT TRANSFER FROM CONTRASTIVE CAPTIONERS

    公开(公告)号:US20250124708A1

    公开(公告)日:2025-04-17

    申请号:US18694604

    申请日:2023-12-08

    Applicant: Google LLC

    Abstract: Provided is an efficient approach to establish a foundational video-text model for tasks including open-vocabulary video classification, text-to-video retrieval, video captioning and video question-answering. Some example implementations include a model which can be referred to as VideoCoCa. Example implementations reuse a pretrained image-text contrastive captioner (CoCa) model and adapt it to video-text tasks with little or minimal extra training. While previous works adapt image-text models with various cross-frame fusion modules (for example, cross-frame attention layer or perceiver resampler) and finetune the modified architecture on video-text data, aspects of the present disclosure leverage findings that the generative attentional pooling and contrastive attentional pooling layers in the image-text CoCa design are instantly adaptable to “flattened frame embeddings”, yielding a strong zero-shot transfer baseline for many video-text tasks.

    GENERATING LABELED TRAINING DATA USING A PRE-TRAINED LANGUAGE MODEL NEURAL NETWORK

    公开(公告)号:US20230196105A1

    公开(公告)日:2023-06-22

    申请号:US18082934

    申请日:2022-12-16

    Applicant: Google LLC

    CPC classification number: G06N3/08

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating labeled training data using a pre-trained language model neural network. In particular, the language model neural network can generate the text input in a new labeled training example from an input sequence that includes (i) one or more context inputs and (ii) a text label that identifies the ground truth category for the new labeled training example.

    Generating diverse and natural text-to-speech samples

    公开(公告)号:US11475874B2

    公开(公告)日:2022-10-18

    申请号:US17163007

    申请日:2021-01-29

    Applicant: Google LLC

    Abstract: A method of generating diverse and natural text-to-speech (TTS) samples includes receiving a text and generating a speech sample based on the text using a TTS model. A training process trains the TTS model to generate the speech sample by receiving training samples. Each training sample includes a spectrogram and a training text corresponding to the spectrogram. For each training sample, the training process identifies speech units associated with the training text. For each speech unit, the training process generates a speech embedding, aligns the speech embedding with a portion of the spectrogram, extracts a latent feature from the aligned portion of the spectrogram, and assigns a quantized embedding to the latent feature. The training process generates the speech sample by decoding a concatenation of the speech embeddings and a quantized embeddings for the speech units associated with the training text corresponding to the spectrogram.

    Demonstration-driven Scalable Task-oriented Dialogue Modeling

    公开(公告)号:US20240221731A1

    公开(公告)日:2024-07-04

    申请号:US18148037

    申请日:2022-12-29

    Applicant: Google LLC

    CPC classification number: G10L15/1815 G06F40/35 G10L15/063 G10L2015/0633

    Abstract: Example methods include determining an input prompt comprising an utterance labeled with a sequence of slot-value pairs, wherein the sequence of slot-value pairs indicates possible slots and values in the utterance, and wherein the utterance relates to a task. The methods include determining a contextual representation comprising a concatenation of a history of utterances exchanged between a user and a service agent. The utterances describe a context for the task. The methods include training, based on a concatenation of the input prompt and the contextual representation, a sequence-to-sequence language model to predict a sequence of dialog states for an input task. The sequence of dialog states comprise an assignment of values to slots for which the user has indicated a preference in dialog sequences. The methods include providing the trained sequence-to-sequence language model.

    Systems and Methods for Pretraining Image Processing Models

    公开(公告)号:US20230281400A1

    公开(公告)日:2023-09-07

    申请号:US17685774

    申请日:2022-03-03

    Applicant: Google LLC

    CPC classification number: G06F40/58 G06F40/284 G06V10/766 G06V30/10

    Abstract: Example embodiments of the present disclosure relate to systems and methods for pretraining image-processing models on weakly-supervised image-text pairs. The pretraining can include receiving a training sequence for the machine-learned image-processing model. The training sequence can include text tokens and image tokens. A prefix sequence can contain the image tokens. A remainder sequence can include a remainder set of the text tokens. The pretraining can include determining, using the prefix sequence as an input to the machine-learned image-processing model, an objective based on recovery of the remainder sequence. The pretraining can include updating one or more learnable parameters of the machine-learned image-processing model based on the objective.

    Federated Learning with Partially Trainable Networks

    公开(公告)号:US20230214642A1

    公开(公告)日:2023-07-06

    申请号:US17568933

    申请日:2022-01-05

    Applicant: Google LLC

    CPC classification number: G06N3/08

    Abstract: Example aspects of the present disclosure provide a novel, resource-efficient approach for federated machine learning techniques with PTNs. The system can determine a first set of training parameters from a plurality of parameters of the global model. Additionally, the system can generate a random seed, using a random number generator, based on a set of frozen parameters. Moreover, the system can transmit, respectively to a plurality of client computing devices, a first set of training parameters and the random seed. Furthermore, the system can receive, respectively from the plurality of client computing devices, updates to one or more parameters in the first set of training parameters. Subsequently, the system can aggregate the updates to one or more parameters that are respectively received from the plurality of client computing devices. The system can modify one or more global parameters of the global model based on the aggregation.

    Generating Diverse and Natural Text-To-Speech Samples

    公开(公告)号:US20220246132A1

    公开(公告)日:2022-08-04

    申请号:US17163007

    申请日:2021-01-29

    Applicant: Google LLC

    Abstract: A method of generating diverse and natural text-to-speech (TTS) samples includes receiving a text and generating a speech sample based on the text using a TTS model. A training process trains the TTS model to generate the speech sample by receiving training samples. Each training sample includes a spectrogram and a training text corresponding to the spectrogram. For each training sample, the training process identifies speech units associated with the training text. For each speech unit, the training process generates a speech embedding, aligns the speech embedding with a portion of the spectrogram, extracts a latent feature from the aligned portion of the spectrogram, and assigns a quantized embedding to the latent feature. The training process generates the speech sample by decoding a concatenation of the speech embeddings and a quantized embeddings for the speech units associated with the training text corresponding to the spectrogram.

Patent Agency Ranking