Contrastive Pre-Training for Language Tasks

    公开(公告)号:US20230015737A1

    公开(公告)日:2023-01-19

    申请号:US17947843

    申请日:2022-09-19

    Applicant: Google LLC

    Abstract: Systems and methods are provided that train a machine-learned language encoding model through the use of a contrastive learning task. In particular, the present disclosure describes a contrastive learning task where the encoder learns to distinguish input tokens from plausible alternatives. In some implementations, on each training example the proposed method masks out some subset (e.g., 15%) of the original input tokens, replaces the masked tokens with samples from a “generator” (e.g., which may be a small masked language model), and then trains the encoder to predict whether each token comes from the original data or is a replacement produced by the generator.

    Task Augmentation and Self-Training for Improved Few-Shot Learning

    公开(公告)号:US20220383206A1

    公开(公告)日:2022-12-01

    申请号:US17826690

    申请日:2022-05-27

    Applicant: Google LLC

    Abstract: Systems and methods can leverage task-specific unlabeled data to improve downstream performance in data-constrained scenarios. Given a target task, a first technique proposed herein, which can be referred to as task augmentation, uses unlabeled text from the target domain to synthesize a large amount of in-domain training data for an auxiliary task A second technique provides a self-training algorithm, where a model learns to improve itself using its predictions on unlabeled examples.

    Learning longer-term dependencies in neural network using auxiliary losses

    公开(公告)号:US11501168B2

    公开(公告)日:2022-11-15

    申请号:US16273041

    申请日:2019-02-11

    Applicant: Google LLC

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for structuring and training a recurrent neural network. This describes a technique that improves the ability to capture long term dependencies in recurrent neural networks by adding an unsupervised auxiliary loss at one or more anchor points to the original objective. This auxiliary loss forces the network to either reconstruct previous events or predict next events in a sequence, making truncated backpropagation feasible for long sequences and also improving full backpropagation through time.

    Vector-Quantized Image Modeling
    17.
    发明公开

    公开(公告)号:US20240112088A1

    公开(公告)日:2024-04-04

    申请号:US18520083

    申请日:2023-11-27

    Applicant: Google LLC

    CPC classification number: G06N20/00

    Abstract: Systems and methods are provided for vector-quantized image modeling using vision transformers and improved codebook handling. In particular, the present disclosure provides a Vector-quantized Image Modeling (VIM) approach that involves pretraining a machine learning model (e.g., Transformer model) to predict rasterized image tokens autoregressively. The discrete image tokens can be encoded from a learned Vision-Transformer-based VQGAN (example implementations of which can be referred to as ViT-VQGAN). The present disclosure proposes multiple improvements over vanilla VQGAN from architecture to codebook learning, yielding better efficiency and reconstruction fidelity. The improved ViT-VQGAN further improves vector-quantized image modeling tasks, including unconditional image generation, conditioned image generation (e.g., class-conditioned image generation), and unsupervised representation learning.

    Contrastive pre-training for language tasks

    公开(公告)号:US11449684B2

    公开(公告)日:2022-09-20

    申请号:US17026780

    申请日:2020-09-21

    Applicant: Google LLC

    Abstract: Systems and methods are provided that train a machine-learned language encoding model through the use of a contrastive learning task. In particular, the present disclosure describes a contrastive learning task where the encoder learns to distinguish input tokens from plausible alternatives. In some implementations, on each training example the proposed method masks out some subset (e.g., 15%) of the original input tokens, replaces the masked tokens with samples from a “generator” (e.g., which may be a small masked language model), and then trains the encoder to predict whether each token comes from the original data or is a replacement produced by the generator.

Patent Agency Ranking