DETECTING OBJECTS IN IMAGES BY GENERATING SEQUENCES OF TOKENS

    公开(公告)号:US20250139959A1

    公开(公告)日:2025-05-01

    申请号:US18690550

    申请日:2022-09-19

    Applicant: Google LLC

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for object detection using neural networks. In one aspect, one of the methods includes obtaining an input image; processing the input image using an object detection neural network to generate an output sequence that comprises respective token at each of a plurality of time steps, wherein each token is selected from a vocabulary of tokens that comprises (i) a first set of tokens that each represent a respective discrete number from a set of discretized numbers and (ii) a second set of tokens that each represent a respective object category from a set of object categories; and generating, from the tokens in the output sequence, an object detection output for the input image.

    GENERATING VIDEOS USING DIFFUSION MODELS
    3.
    发明公开

    公开(公告)号:US20240338936A1

    公开(公告)日:2024-10-10

    申请号:US18296938

    申请日:2023-04-06

    Applicant: Google LLC

    CPC classification number: G06V10/82 G06V10/771 H04N7/0117 H04N7/013

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating an output video conditioned on an input. In one aspect, a method comprises receiving the input; initializing a current intermediate representation; generating an output video by updating the current intermediate representation at each of a plurality of iterations, wherein the updating comprises, at each iteration: processing an intermediate input for the iteration comprising the current intermediate representation using a diffusion model that is configured to process the intermediate input to generate a noise output; and updating the current intermediate representation using the noise output for the iteration.

Patent Agency Ranking