TRAINING LARGE-SCALE VISION TRANSFORMER NEURAL NETWORKS WITH VARIABLE PATCH SIZES
Abstract:
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a neural network that is configured to process an input image to generate a network output for the input image. In one aspect, a method comprises, at each of a plurality of training steps: obtaining a plurality of training images for the training step; obtaining, for each of the plurality of training images, a respective target output; and selecting, from a plurality of image patch generation schemes, an image patch generation scheme for the training step, wherein, given an input image, each of the plurality of image patch generation schemes generates a different number of patches of the input image, and wherein each patch comprises a respective subset of the pixels of the input image.
Information query
Patent Agency Ranking
0/0