-
公开(公告)号:US20240161459A1
公开(公告)日:2024-05-16
申请号:US18422887
申请日:2024-01-25
Applicant: Google LLC
Inventor: Matthias Johannes Lorenz Minderer , Alexey Alexeevich Gritsenko , Austin Charles Stone , Dirk Weissenborn , Alexey Dosovitskiy , Neil Matthew Tinmouth Houlsby
IPC: G06V10/764 , G06F40/40 , G06V10/22 , G06V10/74 , G06V10/774 , G06V10/776 , G06V10/82
CPC classification number: G06V10/764 , G06F40/40 , G06V10/225 , G06V10/761 , G06V10/774 , G06V10/776 , G06V10/82
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for object detection. In one aspect, a method comprises: obtaining: (i) an image, and (ii) a set of one or more query embeddings, wherein each query embedding represents a respective category of object; processing the image and the set of query embeddings using an object detection neural network to generate object detection data for the image, comprising: processing the image using an image encoding subnetwork of the object detection neural network to generate a set of object embeddings; processing each object embedding using a localization subnetwork to generate localization data defining a corresponding region of the image; and processing: (i) the set of object embeddings, and (ii) the set of query embeddings, using a classification subnetwork to generate, for each object embedding, a respective classification score distribution over the set of query embeddings.
-
公开(公告)号:US12230011B2
公开(公告)日:2025-02-18
申请号:US18422887
申请日:2024-01-25
Applicant: Google LLC
Inventor: Matthias Johannes Lorenz Minderer , Alexey Alexeevich Gritsenko , Austin Charles Stone , Dirk Weissenborn , Alexey Dosovitskiy , Neil Matthew Tinmouth Houlsby
IPC: G06K9/00 , G06F40/40 , G06V10/22 , G06V10/74 , G06V10/764 , G06V10/774 , G06V10/776 , G06V10/82
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for object detection. In one aspect, a method comprises: obtaining: (i) an image, and (ii) a set of one or more query embeddings, wherein each query embedding represents a respective category of object; processing the image and the set of query embeddings using an object detection neural network to generate object detection data for the image, comprising: processing the image using an image encoding subnetwork of the object detection neural network to generate a set of object embeddings; processing each object embedding using a localization subnetwork to generate localization data defining a corresponding region of the image; and processing: (i) the set of object embeddings, and (ii) the set of query embeddings, using a classification subnetwork to generate, for each object embedding, a respective classification score distribution over the set of query embeddings.
-
公开(公告)号:US11928854B2
公开(公告)日:2024-03-12
申请号:US18144045
申请日:2023-05-05
Applicant: Google LLC
Inventor: Matthias Johannes Lorenz Minderer , Alexey Alexeevich Gritsenko , Austin Charles Stone , Dirk Weissenborn , Alexey Dosovitskiy , Neil Matthew Tinmouth Houlsby
IPC: G06K9/00 , G06F40/40 , G06V10/22 , G06V10/74 , G06V10/764 , G06V10/774 , G06V10/776 , G06V10/82
CPC classification number: G06V10/764 , G06F40/40 , G06V10/225 , G06V10/761 , G06V10/774 , G06V10/776 , G06V10/82
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for object detection. In one aspect, a method comprises: obtaining: (i) an image, and (ii) a set of one or more query embeddings, wherein each query embedding represents a respective category of object; processing the image and the set of query embeddings using an object detection neural network to generate object detection data for the image, comprising: processing the image using an image encoding subnetwork of the object detection neural network to generate a set of object embeddings; processing each object embedding using a localization subnetwork to generate localization data defining a corresponding region of the image; and processing: (i) the set of object embeddings, and (ii) the set of query embeddings, using a classification subnetwork to generate, for each object embedding, a respective classification score distribution over the set of query embeddings.
-
4.
公开(公告)号:US20220383628A1
公开(公告)日:2022-12-01
申请号:US17726374
申请日:2022-04-21
Applicant: Google LLC
Inventor: Thomas Kipf , Gamaleldin Elsayed , Aravindh Mahendran , Austin Charles Stone , Sara Sabour Rouh Aghdam , Georg Heigold , Rico Jonschkowski , Alexey Dosovitskiy , Klaus Greff
IPC: G06V10/82 , G06V10/40 , G06V10/774
Abstract: A method includes obtaining first feature vectors and second feature vectors representing contents of a first and second image frame, respectively, of an input video. The method may also include generating, based on the first feature vectors, first slot vectors, where each slot vector represents attributes of a corresponding entity as represented in the first image frame, and generating, based on the first slot vectors, predicted slot vectors including a corresponding predicted slot vector that represents a transition of the attributes of the corresponding entity from the first to the second image frame. The method may additionally include generating, based on the predicted slot vectors and the second feature vectors, second slot vectors including a corresponding slot vector that represents the attributes of the corresponding entity as represented in the second image frame, and determining an output based on the predicted slot vectors or the second slot vectors.
-
公开(公告)号:US20230360365A1
公开(公告)日:2023-11-09
申请号:US18144045
申请日:2023-05-05
Applicant: Google LLC
Inventor: Matthias Johannes Lorenz Minderer , Alexey Alexeevich Gritsenko , Austin Charles Stone , Dirk Weissenborn , Alexey Dosovitskiy , Neil Matthew Tinmouth Houlsby
IPC: G06V10/764 , G06F40/40 , G06V10/82 , G06V10/22 , G06V10/774 , G06V10/776 , G06V10/74
CPC classification number: G06V10/764 , G06F40/40 , G06V10/82 , G06V10/225 , G06V10/774 , G06V10/776 , G06V10/761
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for object detection. In one aspect, a method comprises: obtaining: (i) an image, and (ii) a set of one or more query embeddings, wherein each query embedding represents a respective category of object; processing the image and the set of query embeddings using an object detection neural network to generate object detection data for the image, comprising: processing the image using an image encoding subnetwork of the object detection neural network to generate a set of object embeddings; processing each object embedding using a localization subnetwork to generate localization data defining a corresponding region of the image; and processing: (i) the set of object embeddings, and (ii) the set of query embeddings, using a classification subnetwork to generate, for each object embedding, a respective classification score distribution over the set of query embeddings.
-
公开(公告)号:US20250148759A1
公开(公告)日:2025-05-08
申请号:US19014029
申请日:2025-01-08
Applicant: Google LLC
Inventor: Matthias Johannes Lorenz Minderer , Alexey Alexeevich Gritsenko , Austin Charles Stone , Dirk Weissenborn , Alexey Dosovitskiy , Neil Matthew Tinmouth Houlsby
IPC: G06V10/764 , G06F40/40 , G06V10/22 , G06V10/74 , G06V10/774 , G06V10/776 , G06V10/82
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for object detection. In one aspect, a method comprises: obtaining: (i) an image, and (ii) a set of one or more query embeddings, wherein each query embedding represents a respective category of object; processing the image and the set of query embeddings using an object detection neural network to generate object detection data for the image, comprising: processing the image using an image encoding subnetwork of the object detection neural network to generate a set of object embeddings; processing each object embedding using a localization subnetwork to generate localization data defining a corresponding region of the image; and processing: (i) the set of object embeddings, and (ii) the set of query embeddings, using a classification subnetwork to generate, for each object embedding, a respective classification score distribution over the set of query embeddings.
-
公开(公告)号:US20240189994A1
公开(公告)日:2024-06-13
申请号:US18539171
申请日:2023-12-13
Applicant: Google LLC
Inventor: Keerthana P G , Karol Hausman , Julian Ibarz , Brian Ichter , Alexander Irpan , Dmitry Kalashnikov , Yao Lu , Kanury Kanishka Rao , Michael Sahngwon Ryoo , Austin Charles Stone , Teddey Ming Xiao , Quan Ho Vuong , Sumedh Anand Sontakke
IPC: B25J9/16
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for controlling an agent interacting with an environment. In one aspect, a method comprises: receiving a natural language text sequence that characterizes a task to be performed by the agent in the environment; generating an encoded representation of the natural language text sequence; and at each of a plurality of time steps: obtaining an observation image characterizing a state of the environment at the time step; processing the observation image to generate an encoded representation of the observation image; generating a sequence of input tokens; processing the sequence of input tokens to generate a policy output that defines an action to be performed by the agent in response to the observation image; selecting an action to be performed by the agent using the policy output; and causing the agent to perform the selected action.
-
-
-
-
-
-