-
公开(公告)号:US20220108423A1
公开(公告)日:2022-04-07
申请号:US17449162
申请日:2021-09-28
Applicant: Google LLC
Inventor: Manoj Kumar Sivaraj , Dirk Weissenborn , Nal Emmerich Kalchbrenner
Abstract: Apparatus and methods relate to receiving an input image comprising an array of pixels, wherein the input image is associated with a first characteristic; applying a neural network to transform the input image to an output image associated with a second characteristic by generating, by an encoder and for each pixel of the array of pixels of the input image, an encoded pixel, providing, to a decoder, the array of encoded pixels, applying, by the decoder, axial attention to decode a given pixel, wherein the axial attention comprises a row attention or a column attention applied to one or more previously decoded pixels in rows or columns preceding a row or column associated with the given pixel, wherein the row or column attention mixes information within a respective row or column, and maintains independence between respective different rows or different columns; and generating, by the neural network, the output image.
-
公开(公告)号:US20190286984A1
公开(公告)日:2019-09-19
申请号:US16351104
申请日:2019-03-12
Applicant: Google LLC
Inventor: Vijay Vasudevan , Mohammad Norouzi , George Edward Dahl , Manoj Kumar Sivaraj
Abstract: A method of determining a final architecture for a neural network (NN) for performing a particular NN task is described. The method includes: maintaining a sequence of classifiers, wherein each classifier has been trained to process an input candidate architecture and to assign a score label to the input candidate architecture that defines whether the input candidate architecture is accepted or rejected from further consideration; repeatedly performing the following operations: sampling, from a search space, a batch of candidate architectures; for each candidate architecture: determining whether the candidate architecture is accepted by all of the classifiers in the sequence of classifiers; in response to a determination that the candidate architecture is accepted by all classifiers, adding the candidate architecture to a surviving set of candidate architectures; and selecting a candidate architecture from the surviving set as the final architecture for the neural network for performing the particular NN task.
-
公开(公告)号:US12182965B2
公开(公告)日:2024-12-31
申请号:US17449162
申请日:2021-09-28
Applicant: Google LLC
Inventor: Manoj Kumar Sivaraj , Dirk Weissenborn , Nal Emmerich Kalchbrenner
IPC: G06T3/4046 , G06N3/08 , G06T3/4023
Abstract: Apparatus and methods relate to receiving an input image comprising an array of pixels, wherein the input image is associated with a first characteristic; applying a neural network to transform the input image to an output image associated with a second characteristic by generating, by an encoder and for each pixel of the array of pixels of the input image, an encoded pixel, providing, to a decoder, the array of encoded pixels, applying, by the decoder, axial attention to decode a given pixel, wherein the axial attention comprises a row attention or a column attention applied to one or more previously decoded pixels in rows or columns preceding a row or column associated with the given pixel, wherein the row or column attention mixes information within a respective row or column, and maintains independence between respective different rows or different columns; and generating, by the neural network, the output image.
-
公开(公告)号:US20240257511A1
公开(公告)日:2024-08-01
申请号:US18419170
申请日:2024-01-22
Applicant: Google LLC
Inventor: Manoj Kumar Sivaraj , Neil Matthew Tinmouth Houlsby , Mostafa Dehghani
Abstract: One example aspect of the present disclosure is directed to a neural network for machine vision. The neural network may include a stem block that includes a set of stem layers. The neural network may additionally include a visual transformer block. The set of stem layers may include a patch layer, a first normalization layer, an embedding layer, and a second normalization layer. The patch layer subdivides an input image into a set of image patches. The first normalization layer generates a set of normalized image patches by performing a first normalization process on each image patch of the set of image patches. The patch layer feeds forward to the first normalization layer. The embedding layer generates a set of vector embeddings. Each vector embedding of the set of embedding vectors is a projection of a corresponding normalized image patch from the set of normalized image patches onto a visual token. The first normalization layer feeds forward to the embedding layer. The second normalization layer generates a set of normalized vector embeddings by performing a second normalization process on each vector embedding of the set of vector embeddings. The embedding layer feeds forward to the second normalization layer. The transformer block enables one or more machine vision tasks for the input image based on the set of normalized vectors. The second normalization layer feeds forward to the transformer block.
-
-
-