TRAINING LARGE-SCALE VISION TRANSFORMER NEURAL NETWORKS WITH VARIABLE PATCH SIZES

Invention Publication

US20240169715A1 TRAINING LARGE-SCALE VISION TRANSFORMER NEURAL NETWORKS WITH VARIABLE PATCH SIZES 审中-公开

Please log in to see more content

Patent Title: TRAINING LARGE-SCALE VISION TRANSFORMER NEURAL NETWORKS WITH VARIABLE PATCH SIZES
Application No.: US18518075

Application Date: 2023-11-22
Publication No.: US20240169715A1

Publication Date: 2024-05-23
Inventor: Lucas Klaus Beyer , Pavel Izmailov , Simon Kornblith , Alexander Kolesnikov , Mathilde Caron , Xiaohua Zhai , Matthias Johannes Lorenz Minderer , Ibrahim Alabdulmohsin , Michael Tobias Tschannen , Filip Pavetic
Applicant: GOOGLE LLC
Applicant Address: US CA Mountain View
Assignee: GOOGLE LLC
Current Assignee: GOOGLE LLC
Current Assignee Address: US CA Mountain View
Main IPC: G06V10/82
IPC: G06V10/82 ; G06V10/22

TRAINING LARGE-SCALE VISION TRANSFORMER NEURAL NETWORKS WITH VARIABLE PATCH SIZES

Abstract:

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a neural network that is configured to process an input image to generate a network output for the input image. In one aspect, a method comprises, at each of a plurality of training steps: obtaining a plurality of training images for the training step; obtaining, for each of the plurality of training images, a respective target output; and selecting, from a plurality of image patch generation schemes, an image patch generation scheme for the training step, wherein, given an input image, each of the plurality of image patch generation schemes generates a different number of patches of the input image, and wherein each patch comprises a respective subset of the pixels of the input image.

Information query

Global Dossier Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06V	图像或视频识别或理解
G06V10/00	图像或视频识别或理解的安排（图像或视频中的字符识别 G06V30/10）
G06V10/70	.使用模式识别或机器学习（光学模式识别或电子计算 G06V10/88）
G06V10/82	..使用神经网络