OPEN-VOCABULARY OBJECT DETECTION IN IMAGES

    公开(公告)号:US20250148759A1

    公开(公告)日:2025-05-08

    申请号:US19014029

    申请日:2025-01-08

    Applicant: Google LLC

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for object detection. In one aspect, a method comprises: obtaining: (i) an image, and (ii) a set of one or more query embeddings, wherein each query embedding represents a respective category of object; processing the image and the set of query embeddings using an object detection neural network to generate object detection data for the image, comprising: processing the image using an image encoding subnetwork of the object detection neural network to generate a set of object embeddings; processing each object embedding using a localization subnetwork to generate localization data defining a corresponding region of the image; and processing: (i) the set of object embeddings, and (ii) the set of query embeddings, using a classification subnetwork to generate, for each object embedding, a respective classification score distribution over the set of query embeddings.

    AUTO-REGRESSIVE VIDEO GENERATION NEURAL NETWORKS

    公开(公告)号:US20250061608A1

    公开(公告)日:2025-02-20

    申请号:US18938770

    申请日:2024-11-06

    Applicant: Google LLC

    Abstract: A method for generating a video is described. The method includes: generating an initial output video including multiple frames, each of the frames having multiple channels; identifying a partitioning of the initial output video into a set of channel slices that are indexed according to a particular slice order, each channel slice being a down sampling of a channel stack from a set of channel stacks; initializing, for each channel stack in the set of channel stacks, a set of fully-generated channel slices; repeatedly processing, using an encoder and a decoder, a current output video to generate a next fully-generated channel slice to be added to the current set of fully-generated channel slices; generating, for each channel index, a respective fully-generated channel stack using the respective fully generated channel slices; and generating a fully-generated output video using the fully-generated channel stacks.

    Conditional axial transformer layers for high-fidelity image transformation

    公开(公告)号:US12182965B2

    公开(公告)日:2024-12-31

    申请号:US17449162

    申请日:2021-09-28

    Applicant: Google LLC

    Abstract: Apparatus and methods relate to receiving an input image comprising an array of pixels, wherein the input image is associated with a first characteristic; applying a neural network to transform the input image to an output image associated with a second characteristic by generating, by an encoder and for each pixel of the array of pixels of the input image, an encoded pixel, providing, to a decoder, the array of encoded pixels, applying, by the decoder, axial attention to decode a given pixel, wherein the axial attention comprises a row attention or a column attention applied to one or more previously decoded pixels in rows or columns preceding a row or column associated with the given pixel, wherein the row or column attention mixes information within a respective row or column, and maintains independence between respective different rows or different columns; and generating, by the neural network, the output image.

    Auto-regressive video generation neural networks

    公开(公告)号:US12142015B2

    公开(公告)日:2024-11-12

    申请号:US17609668

    申请日:2020-05-22

    Applicant: Google LLC

    Abstract: A method for generating a video is described. The method includes: generating an initial output video including multiple frames, each of the frames having multiple channels; identifying a partitioning of the initial output video into a set of channel slices that are indexed according to a particular slice order, each channel slice being a down sampling of a channel stack from a set of channel stacks; initializing, for each channel stack in the set of channel stacks, a set of fully-generated channel slices; repeatedly processing, using an encoder and a decoder, a current output video to generate a next fully-generated channel slice to be added to the current set of fully-generated channel slices; generating, for each channel index, a respective fully-generated channel stack using the respective fully generated channel slices; and generating a fully-generated output video using the fully-generated channel stacks.

Patent Agency Ranking