-
公开(公告)号:US20240161459A1
公开(公告)日:2024-05-16
申请号:US18422887
申请日:2024-01-25
Applicant: Google LLC
Inventor: Matthias Johannes Lorenz Minderer , Alexey Alexeevich Gritsenko , Austin Charles Stone , Dirk Weissenborn , Alexey Dosovitskiy , Neil Matthew Tinmouth Houlsby
IPC: G06V10/764 , G06F40/40 , G06V10/22 , G06V10/74 , G06V10/774 , G06V10/776 , G06V10/82
CPC classification number: G06V10/764 , G06F40/40 , G06V10/225 , G06V10/761 , G06V10/774 , G06V10/776 , G06V10/82
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for object detection. In one aspect, a method comprises: obtaining: (i) an image, and (ii) a set of one or more query embeddings, wherein each query embedding represents a respective category of object; processing the image and the set of query embeddings using an object detection neural network to generate object detection data for the image, comprising: processing the image using an image encoding subnetwork of the object detection neural network to generate a set of object embeddings; processing each object embedding using a localization subnetwork to generate localization data defining a corresponding region of the image; and processing: (i) the set of object embeddings, and (ii) the set of query embeddings, using a classification subnetwork to generate, for each object embedding, a respective classification score distribution over the set of query embeddings.
-
公开(公告)号:US20230360365A1
公开(公告)日:2023-11-09
申请号:US18144045
申请日:2023-05-05
Applicant: Google LLC
Inventor: Matthias Johannes Lorenz Minderer , Alexey Alexeevich Gritsenko , Austin Charles Stone , Dirk Weissenborn , Alexey Dosovitskiy , Neil Matthew Tinmouth Houlsby
IPC: G06V10/764 , G06F40/40 , G06V10/82 , G06V10/22 , G06V10/774 , G06V10/776 , G06V10/74
CPC classification number: G06V10/764 , G06F40/40 , G06V10/82 , G06V10/225 , G06V10/774 , G06V10/776 , G06V10/761
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for object detection. In one aspect, a method comprises: obtaining: (i) an image, and (ii) a set of one or more query embeddings, wherein each query embedding represents a respective category of object; processing the image and the set of query embeddings using an object detection neural network to generate object detection data for the image, comprising: processing the image using an image encoding subnetwork of the object detection neural network to generate a set of object embeddings; processing each object embedding using a localization subnetwork to generate localization data defining a corresponding region of the image; and processing: (i) the set of object embeddings, and (ii) the set of query embeddings, using a classification subnetwork to generate, for each object embedding, a respective classification score distribution over the set of query embeddings.
-
公开(公告)号:US20250148759A1
公开(公告)日:2025-05-08
申请号:US19014029
申请日:2025-01-08
Applicant: Google LLC
Inventor: Matthias Johannes Lorenz Minderer , Alexey Alexeevich Gritsenko , Austin Charles Stone , Dirk Weissenborn , Alexey Dosovitskiy , Neil Matthew Tinmouth Houlsby
IPC: G06V10/764 , G06F40/40 , G06V10/22 , G06V10/74 , G06V10/774 , G06V10/776 , G06V10/82
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for object detection. In one aspect, a method comprises: obtaining: (i) an image, and (ii) a set of one or more query embeddings, wherein each query embedding represents a respective category of object; processing the image and the set of query embeddings using an object detection neural network to generate object detection data for the image, comprising: processing the image using an image encoding subnetwork of the object detection neural network to generate a set of object embeddings; processing each object embedding using a localization subnetwork to generate localization data defining a corresponding region of the image; and processing: (i) the set of object embeddings, and (ii) the set of query embeddings, using a classification subnetwork to generate, for each object embedding, a respective classification score distribution over the set of query embeddings.
-
公开(公告)号:US20240338936A1
公开(公告)日:2024-10-10
申请号:US18296938
申请日:2023-04-06
Applicant: Google LLC
Inventor: Jonathan Ho , Tim Salimans , Alexey Alexeevich Gritsenko , William Chan , Mohammad Norouzi , David James Fleet
IPC: G06V10/82 , G06V10/771 , H04N7/01
CPC classification number: G06V10/82 , G06V10/771 , H04N7/0117 , H04N7/013
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating an output video conditioned on an input. In one aspect, a method comprises receiving the input; initializing a current intermediate representation; generating an output video by updating the current intermediate representation at each of a plurality of iterations, wherein the updating comprises, at each iteration: processing an intermediate input for the iteration comprising the current intermediate representation using a diffusion model that is configured to process the intermediate input to generate a noise output; and updating the current intermediate representation using the noise output for the iteration.
-
公开(公告)号:US11928854B2
公开(公告)日:2024-03-12
申请号:US18144045
申请日:2023-05-05
Applicant: Google LLC
Inventor: Matthias Johannes Lorenz Minderer , Alexey Alexeevich Gritsenko , Austin Charles Stone , Dirk Weissenborn , Alexey Dosovitskiy , Neil Matthew Tinmouth Houlsby
IPC: G06K9/00 , G06F40/40 , G06V10/22 , G06V10/74 , G06V10/764 , G06V10/774 , G06V10/776 , G06V10/82
CPC classification number: G06V10/764 , G06F40/40 , G06V10/225 , G06V10/761 , G06V10/774 , G06V10/776 , G06V10/82
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for object detection. In one aspect, a method comprises: obtaining: (i) an image, and (ii) a set of one or more query embeddings, wherein each query embedding represents a respective category of object; processing the image and the set of query embeddings using an object detection neural network to generate object detection data for the image, comprising: processing the image using an image encoding subnetwork of the object detection neural network to generate a set of object embeddings; processing each object embedding using a localization subnetwork to generate localization data defining a corresponding region of the image; and processing: (i) the set of object embeddings, and (ii) the set of query embeddings, using a classification subnetwork to generate, for each object embedding, a respective classification score distribution over the set of query embeddings.
-
公开(公告)号:US20230031702A1
公开(公告)日:2023-02-02
申请号:US17812208
申请日:2022-07-13
Applicant: Google LLC
Inventor: Yang Li , Xin Zhou , Gang Li , Mostafa Dehghani , Alexey Alexeevich Gritsenko
IPC: G06V10/82 , G06F3/16 , G06F40/284
Abstract: A method includes receiving, via a computing device, a screenshot of a display provided by a graphical user interface of the computing device. The method also includes generating, by an image-structure transformer of a neural network, a representation by fusing a first embedding based on the screenshot and a second embedding based on a layout of virtual objects in the screenshot. The method additionally includes predicting, by the neural network and based on the generated representation, a modeling task output associated with the graphical user interface. The method further includes providing, by the computing device, the predicted modeling task output.
-
公开(公告)号:US20210383790A1
公开(公告)日:2021-12-09
申请号:US17339870
申请日:2021-06-04
Applicant: Google LLC
Inventor: Tim Salimans , Alexey Alexeevich Gritsenko
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a generative neural network to convert conditioning text inputs to audio outputs using energy scores.
-
公开(公告)号:US12230011B2
公开(公告)日:2025-02-18
申请号:US18422887
申请日:2024-01-25
Applicant: Google LLC
Inventor: Matthias Johannes Lorenz Minderer , Alexey Alexeevich Gritsenko , Austin Charles Stone , Dirk Weissenborn , Alexey Dosovitskiy , Neil Matthew Tinmouth Houlsby
IPC: G06K9/00 , G06F40/40 , G06V10/22 , G06V10/74 , G06V10/764 , G06V10/774 , G06V10/776 , G06V10/82
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for object detection. In one aspect, a method comprises: obtaining: (i) an image, and (ii) a set of one or more query embeddings, wherein each query embedding represents a respective category of object; processing the image and the set of query embeddings using an object detection neural network to generate object detection data for the image, comprising: processing the image using an image encoding subnetwork of the object detection neural network to generate a set of object embeddings; processing each object embedding using a localization subnetwork to generate localization data defining a corresponding region of the image; and processing: (i) the set of object embeddings, and (ii) the set of query embeddings, using a classification subnetwork to generate, for each object embedding, a respective classification score distribution over the set of query embeddings.
-
公开(公告)号:US20240346824A1
公开(公告)日:2024-10-17
申请号:US18634794
申请日:2024-04-12
Applicant: Google LLC
Inventor: Alexey Alexeevich Gritsenko , Xuehan Xiong , Josip Djolonga , Mostafa Dehghani , Chen Sun , Mario Lucic , Cordelia Luise Schmid , Anurag Arnab
IPC: G06V20/40 , G06T7/73 , G06V10/62 , G06V10/764 , G06V10/77 , G06V10/774 , G06V10/776 , G06V10/82
CPC classification number: G06V20/46 , G06T7/73 , G06V10/62 , G06V10/764 , G06V10/7715 , G06V10/774 , G06V10/776 , G06V10/82 , G06T2207/10016 , G06T2207/20081 , G06T2207/20084
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for performing action localization on an input video. In particular, a system maintains a set of query vectors and uses the input video and the set of query vectors to generate an action localization output for the input video. The action localization output includes, for each of one or more agents depicted in the video, data specifying, for each of one or more video frames in the video, a respective bounding box in the video frame that depicts the agent and a respective action from a set of actions that is being performed by the agent in the video frame.
-
公开(公告)号:US12073819B2
公开(公告)日:2024-08-27
申请号:US17339870
申请日:2021-06-04
Applicant: Google LLC
Inventor: Tim Salimans , Alexey Alexeevich Gritsenko
CPC classification number: G10L13/047 , G06N3/08 , G10L13/08 , G10L25/18 , G10L25/21 , G10L25/30 , G10L25/51
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a generative neural network to convert conditioning text inputs to audio outputs using energy scores.
-
-
-
-
-
-
-
-
-