Patent search ap:("Google LLC") AND inv:"Han Zhang" Page 1

1.

发明申请
Vector-Quantized Image Modeling 有权

公开(公告)号：US20240404238A1

公开(公告)日：2024-12-05

申请号：US18698997

申请日：2022-10-05

Applicant: Google LLC

Inventor： Jiahui Yu , Vijay Vasudevan , Alexander Yeong-Shiuh Ku , Yonghui Wu , Jason Michael Baldridge , Yuanzhong Xu , Jing Yu Koh , Thang Minh Luong , Gunjan Baid , Zirui Wang , Han Zhang , Xin Li

IPC: G06V10/28 , G06F40/284 , G06V10/764 , G06V10/766 , G06V10/82

Abstract: Systems and methods are provided for vector-quantized image modeling using vision transformers and improved codebook handling. In particular, the present disclosure provides a Vector-quantized Image Modeling (VIM) approach that involves pre-training a machine learning model (e.g., Transformer model) to predict rasterized image tokens autoregressively. The discrete image tokens can be encoded from a learned Vision-Transformer-based VQGAN (example implementations of which can be referred to as ViT-VQGAN). The present disclosure proposes multiple improvements over vanilla VQGAN from architecture to codebook learning, yielding better efficiency and reconstruction fidelity. The improved ViT-VQGAN further improves vector-quantized image modeling tasks, including unconditional image generation, conditioned image generation (e.g., class-conditioned image generation), and unsupervised representation learning.

2.

发明授权
Cross-modal contrastive learning for text-to-image generation based on machine learning models 有权

公开(公告)号：US12067646B2

公开(公告)日：2024-08-20

申请号：US17467628

申请日：2021-09-07

Applicant: Google LLC

Inventor： Han Zhang , Jing Yu Koh , Jason Michael Baldridge , Yinfei Yang , Honglak Lee

IPC: G06T11/00 , G06F18/214 , G06F18/22 , G06N3/08 , G10L15/26

CPC classification number: G06T11/00 , G06F18/2148 , G06F18/22 , G06N3/08 , G10L15/26

Abstract: A computer-implemented method includes receiving, by a computing device, a particular textual description of a scene. The method also includes applying a neural network for text-to-image generation to generate an output image rendition of the scene, the neural network having been trained to cause two image renditions associated with a same textual description to attract each other and two image renditions associated with different textual descriptions to repel each other based on mutual information between a plurality of corresponding pairs, wherein the plurality of corresponding pairs comprise an image-to-image pair and a text-to-image pair. The method further includes predicting the output image rendition of the scene.

3.

发明公开
Vector-Quantized Image Modeling 审中-公开

公开(公告)号：US20240112088A1

公开(公告)日：2024-04-04

申请号：US18520083

申请日：2023-11-27

Applicant: Google LLC

Inventor： Jiahui Yu , Xin Li , Han Zhang , Vijay Vasudevan , Alexander Yeong-Shiuh Ku , Jason Michael Baldridge , Yuanzhong Xu , Jing Yu Koh , Thang Minh Luong , Gunjan Baid , Zirui Wang , Yonghui Wu

IPC: G06N20/00

CPC classification number: G06N20/00

Abstract: Systems and methods are provided for vector-quantized image modeling using vision transformers and improved codebook handling. In particular, the present disclosure provides a Vector-quantized Image Modeling (VIM) approach that involves pretraining a machine learning model (e.g., Transformer model) to predict rasterized image tokens autoregressively. The discrete image tokens can be encoded from a learned Vision-Transformer-based VQGAN (example implementations of which can be referred to as ViT-VQGAN). The present disclosure proposes multiple improvements over vanilla VQGAN from architecture to codebook learning, yielding better efficiency and reconstruction fidelity. The improved ViT-VQGAN further improves vector-quantized image modeling tasks, including unconditional image generation, conditioned image generation (e.g., class-conditioned image generation), and unsupervised representation learning.

4.

发明公开
Complementary Prompting For Rehearsal-Free Continual Learning 审中-公开

公开(公告)号：US20230274143A1

公开(公告)日：2023-08-31

申请号：US18173985

申请日：2023-02-24

Applicant: Google LLC

Inventor： Zizhao Zhang , Zifeng Wang , Chen-Yu Lee , Ruoxi Sun , Sayna Ebrahimi , Xiaoqi Ren , Guolong Su , Vincent Perot , Tomas Pfister , Han Zhang

IPC: G06N3/08

CPC classification number: G06N3/08

Abstract: A method for rehearsal-free continual learning includes obtaining a set of training samples where training sample in the set of training samples is associated with a respective task of a plurality of different tasks. The method includes obtaining a task-invariant prompt representative of learned knowledge common to each respective task of the plurality of different tasks. The method includes, for each respective task of the plurality of different tasks, obtaining a respective task-specific prompt representative of learned knowledge specific to the respective task. The method includes, during each of one or more training iterations, for each respective training sample in the set of training samples, selecting the respective task-specific prompt representative of the respective task of the respective training sample and training a model using the task-invariant prompt and the selected respective task-specific prompt.

5.

发明公开
CURVED LIGHTGUIDE IN A SEE-THROUGH SHELL 审中-公开

公开(公告)号：US20240337839A1

公开(公告)日：2024-10-10

申请号：US18130601

申请日：2023-04-04

Applicant: GOOGLE LLC

Inventor： Ozan Cakmakci , Oscar Alberto Martinez , Eliezer Glik , Han Zhang

IPC: G02B27/01

CPC classification number: G02B27/0172 , G02B2027/0178

Abstract: A head mounted display includes an eyeglasses frame, a lens framed in the eyeglasses frame, and a light engine disposed in the eyeglasses frame. The lens includes an optical shell comprising a world-facing spherical surface and an opposing eye-facing surface and a curved lightguide disposed in the optical shell. The curved lightguide includes an incoupler surface, a first freeform surface facing the world-facing spherical surface, and a second freeform surface facing the eye-facing surface. The lens further includes a first low refractive index region disposed between the first freeform surface and a first conformal freeform surface of the optical shell and a second low refractive index region disposed between the second freeform surface and a second conformal freeform surface of the optical shell.

6.

发明公开
ROBUST TRAINING IN THE PRESENCE OF LABEL NOISE 审中-公开

公开(公告)号：US20230351192A1

公开(公告)日：2023-11-02

申请号：US18348587

申请日：2023-07-07

Applicant: Google LLC

Inventor： Zizhao Zhang , Sercan Omer Arik , Tomas Jon Pfister , Han Zhang

IPC: G06N3/084 , G06N20/00 , G06N5/04 , G06V10/762 , G06V10/771 , G06V10/774 , G06V10/776 , G06V10/82

CPC classification number: G06N3/084 , G06N20/00 , G06N5/04 , G06V10/763 , G06V10/771 , G06V10/774 , G06V10/776 , G06V10/82

Abstract: A method for training a model comprises obtaining a set of labeled training samples each associated with a given label. For each labeled training sample, the method includes generating a pseudo label and estimating a weight of the labeled training sample indicative of an accuracy of the given label. The method also includes determining whether the weight of the labeled training sample satisfies a weight threshold. When the weight of the labeled training sample satisfies the weight threshold, the method includes adding the labeled training sample to a set of cleanly labeled training samples. Otherwise, the method includes adding the labeled training sample to a set of mislabeled training samples. The method includes training the model with the set of cleanly labeled training samples using corresponding given labels and the set of mislabeled training samples using corresponding pseudo labels.

7.

发明申请
Aggregating Nested Vision Transformers 有权

公开(公告)号：US20220375205A1

公开(公告)日：2022-11-24

申请号：US17664402

申请日：2022-05-20

Applicant: Google LLC

Inventor： Zizhao Zhang , Han Zhang , Long Zhao , Tomas Pfister

IPC: G06V10/77 , G06V10/764 , G06V10/22 , G06V10/44

Abstract: A method includes receiving image data including a series of image patches of an image. The method includes generating, using a first set of transformers of a vision transformer (V-T) model, a first set of higher order feature representations based on the series of image patches and aggregating the first set of higher order feature representations into a second set of higher order feature representations that is smaller than the first set. The method includes generating, using a second set of transformers of the V-T model, a third set of higher order feature representations based on the second set of higher order feature representations and aggregating the third set of higher order feature representations into a fourth set of higher order feature representations that is smaller than the third set. The method includes generating, using the V-T model, an image classification of the image based on the fourth set.

8.

发明申请
Machine Learning Models Featuring Resolution-Flexible Multi-Axis Attention Blocks 有权

公开(公告)号：US20250069382A1

公开(公告)日：2025-02-27

申请号：US18726881

申请日：2023-01-05

Applicant: Google LLC

Inventor： Yinxiao Li , Zhengzhong Tu , Hossein Talebi , Han Zhang , Feng Yang , Peyman Milanfar

IPC: G06V10/82 , G06V10/764 , G06V10/77

Abstract: Provided are machine learning systems and models featuring resolution-flexible multi-axis attention blocks. In particular, the present disclosure provides example multi-axis MLP based architectures (example implementations of which can be generally referred to as MAXIM) that can serve as an efficient and flexible general-purpose vision backbone for image processing tasks. In some implementations, MAXIM can use a UNet-shaped hierarchical structure and supports long-range interactions enabled by spatially-gated MLPs. Specifically, some example implementations of MAXIM can contain two MLP-based building blocks: a multi-axis gated MLP that allows for efficient and scalable spatial mixing of local and global visual cues, and a cross-gating block, an alternative to cross-attention, which accounts for cross-feature mutual conditioning.

9.

发明公开
Cross-Modal Contrastive Learning for Text-to-Image Generation based on Machine Learning Models 审中-公开

公开(公告)号：US20240362830A1

公开(公告)日：2024-10-31

申请号：US18770154

申请日：2024-07-11

Applicant: Google LLC

Inventor： Han Zhang , Jing Yu Koh , Jason Michael Baldridge , Yinfei Yang , Honglak Lee

IPC: G06T11/00 , G06F18/214 , G06F18/22 , G06N3/08 , G10L15/26

CPC classification number: G06T11/00 , G06F18/2148 , G06F18/22 , G06N3/08 , G10L15/26

Abstract: A computer-implemented method includes receiving, by a computing device, a particular textual description of a scene. The method also includes applying a neural network for text-to-image generation to generate an output image rendition of the scene, the neural network having been trained to cause two image renditions associated with a same textual description to attract each other and two image renditions associated with different textual descriptions to repel each other based on mutual information between a plurality of corresponding pairs, wherein the plurality of corresponding pairs comprise an image-to-image pair and a text-to-image pair. The method further includes predicting the output image rendition of the scene.

10.

发明申请
Multi-Axis Vision Transformer 有权

公开(公告)号：US20250022269A1

公开(公告)日：2025-01-16

申请号：US18902546

申请日：2024-09-30

Applicant: Google LLC

Inventor： Yinxiao Li , Feng Yang , Peyman Milanfar , Han Zhang , Zhengzhong Tu , Hossein Talebi

IPC: G06V10/82 , G06V10/77

Abstract: Provided is an efficient and scalable attention model that can be referred to as multi-axis attention. Example implementations can include two aspects: blocked local and dilated global attention. These design choices allow global-local spatial interactions on arbitrary input resolutions with only linear complexity. The present disclosure also presents a new architectural element by effectively blending the proposed multi-axis attention model with convolutions. In addition, the present disclosure proposes a simple hierarchical vision backbone, example implementations of which can be referred to as MaxViT, by simply repeating the basic building block over multiple stages. Notably, MaxViT is able to “see” globally throughout the entire network, even in earlier, high-resolution stages.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification