Vector-Quantized Image Modeling
    3.
    发明公开

    公开(公告)号:US20240112088A1

    公开(公告)日:2024-04-04

    申请号:US18520083

    申请日:2023-11-27

    Applicant: Google LLC

    CPC classification number: G06N20/00

    Abstract: Systems and methods are provided for vector-quantized image modeling using vision transformers and improved codebook handling. In particular, the present disclosure provides a Vector-quantized Image Modeling (VIM) approach that involves pretraining a machine learning model (e.g., Transformer model) to predict rasterized image tokens autoregressively. The discrete image tokens can be encoded from a learned Vision-Transformer-based VQGAN (example implementations of which can be referred to as ViT-VQGAN). The present disclosure proposes multiple improvements over vanilla VQGAN from architecture to codebook learning, yielding better efficiency and reconstruction fidelity. The improved ViT-VQGAN further improves vector-quantized image modeling tasks, including unconditional image generation, conditioned image generation (e.g., class-conditioned image generation), and unsupervised representation learning.

    Complementary Prompting For Rehearsal-Free Continual Learning

    公开(公告)号:US20230274143A1

    公开(公告)日:2023-08-31

    申请号:US18173985

    申请日:2023-02-24

    Applicant: Google LLC

    CPC classification number: G06N3/08

    Abstract: A method for rehearsal-free continual learning includes obtaining a set of training samples where training sample in the set of training samples is associated with a respective task of a plurality of different tasks. The method includes obtaining a task-invariant prompt representative of learned knowledge common to each respective task of the plurality of different tasks. The method includes, for each respective task of the plurality of different tasks, obtaining a respective task-specific prompt representative of learned knowledge specific to the respective task. The method includes, during each of one or more training iterations, for each respective training sample in the set of training samples, selecting the respective task-specific prompt representative of the respective task of the respective training sample and training a model using the task-invariant prompt and the selected respective task-specific prompt.

    CURVED LIGHTGUIDE IN A SEE-THROUGH SHELL
    5.
    发明公开

    公开(公告)号:US20240337839A1

    公开(公告)日:2024-10-10

    申请号:US18130601

    申请日:2023-04-04

    Applicant: GOOGLE LLC

    CPC classification number: G02B27/0172 G02B2027/0178

    Abstract: A head mounted display includes an eyeglasses frame, a lens framed in the eyeglasses frame, and a light engine disposed in the eyeglasses frame. The lens includes an optical shell comprising a world-facing spherical surface and an opposing eye-facing surface and a curved lightguide disposed in the optical shell. The curved lightguide includes an incoupler surface, a first freeform surface facing the world-facing spherical surface, and a second freeform surface facing the eye-facing surface. The lens further includes a first low refractive index region disposed between the first freeform surface and a first conformal freeform surface of the optical shell and a second low refractive index region disposed between the second freeform surface and a second conformal freeform surface of the optical shell.

    Aggregating Nested Vision Transformers

    公开(公告)号:US20220375205A1

    公开(公告)日:2022-11-24

    申请号:US17664402

    申请日:2022-05-20

    Applicant: Google LLC

    Abstract: A method includes receiving image data including a series of image patches of an image. The method includes generating, using a first set of transformers of a vision transformer (V-T) model, a first set of higher order feature representations based on the series of image patches and aggregating the first set of higher order feature representations into a second set of higher order feature representations that is smaller than the first set. The method includes generating, using a second set of transformers of the V-T model, a third set of higher order feature representations based on the second set of higher order feature representations and aggregating the third set of higher order feature representations into a fourth set of higher order feature representations that is smaller than the third set. The method includes generating, using the V-T model, an image classification of the image based on the fourth set.

    Machine Learning Models Featuring Resolution-Flexible Multi-Axis Attention Blocks

    公开(公告)号:US20250069382A1

    公开(公告)日:2025-02-27

    申请号:US18726881

    申请日:2023-01-05

    Applicant: Google LLC

    Abstract: Provided are machine learning systems and models featuring resolution-flexible multi-axis attention blocks. In particular, the present disclosure provides example multi-axis MLP based architectures (example implementations of which can be generally referred to as MAXIM) that can serve as an efficient and flexible general-purpose vision backbone for image processing tasks. In some implementations, MAXIM can use a UNet-shaped hierarchical structure and supports long-range interactions enabled by spatially-gated MLPs. Specifically, some example implementations of MAXIM can contain two MLP-based building blocks: a multi-axis gated MLP that allows for efficient and scalable spatial mixing of local and global visual cues, and a cross-gating block, an alternative to cross-attention, which accounts for cross-feature mutual conditioning.

    Multi-Axis Vision Transformer
    10.
    发明申请

    公开(公告)号:US20250022269A1

    公开(公告)日:2025-01-16

    申请号:US18902546

    申请日:2024-09-30

    Applicant: Google LLC

    Abstract: Provided is an efficient and scalable attention model that can be referred to as multi-axis attention. Example implementations can include two aspects: blocked local and dilated global attention. These design choices allow global-local spatial interactions on arbitrary input resolutions with only linear complexity. The present disclosure also presents a new architectural element by effectively blending the proposed multi-axis attention model with convolutions. In addition, the present disclosure proposes a simple hierarchical vision backbone, example implementations of which can be referred to as MaxViT, by simply repeating the basic building block over multiple stages. Notably, MaxViT is able to “see” globally throughout the entire network, even in earlier, high-resolution stages.

Patent Agency Ranking