Invention Publication
- Patent Title: TRAINING LARGE-SCALE VISION TRANSFORMER NEURAL NETWORKS WITH VARIABLE PATCH SIZES
-
Application No.: US18518075Application Date: 2023-11-22
-
Publication No.: US20240169715A1Publication Date: 2024-05-23
- Inventor: Lucas Klaus Beyer , Pavel Izmailov , Simon Kornblith , Alexander Kolesnikov , Mathilde Caron , Xiaohua Zhai , Matthias Johannes Lorenz Minderer , Ibrahim Alabdulmohsin , Michael Tobias Tschannen , Filip Pavetic
- Applicant: GOOGLE LLC
- Applicant Address: US CA Mountain View
- Assignee: GOOGLE LLC
- Current Assignee: GOOGLE LLC
- Current Assignee Address: US CA Mountain View
- Main IPC: G06V10/82
- IPC: G06V10/82 ; G06V10/22

Abstract:
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a neural network that is configured to process an input image to generate a network output for the input image. In one aspect, a method comprises, at each of a plurality of training steps: obtaining a plurality of training images for the training step; obtaining, for each of the plurality of training images, a respective target output; and selecting, from a plurality of image patch generation schemes, an image patch generation scheme for the training step, wherein, given an input image, each of the plurality of image patch generation schemes generates a different number of patches of the input image, and wherein each patch comprises a respective subset of the pixels of the input image.
Information query