Patent search ap:("Google LLC") AND inv:"Alexey Alexeevich Gritsenko" Page 1

1.

发明公开
OPEN-VOCABULARY OBJECT DETECTION IN IMAGES 审中-公开

公开(公告)号：US20240161459A1

公开(公告)日：2024-05-16

申请号：US18422887

申请日：2024-01-25

Applicant: Google LLC

Inventor： Matthias Johannes Lorenz Minderer , Alexey Alexeevich Gritsenko , Austin Charles Stone , Dirk Weissenborn , Alexey Dosovitskiy , Neil Matthew Tinmouth Houlsby

IPC: G06V10/764 , G06F40/40 , G06V10/22 , G06V10/74 , G06V10/774 , G06V10/776 , G06V10/82

CPC classification number: G06V10/764 , G06F40/40 , G06V10/225 , G06V10/761 , G06V10/774 , G06V10/776 , G06V10/82

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for object detection. In one aspect, a method comprises: obtaining: (i) an image, and (ii) a set of one or more query embeddings, wherein each query embedding represents a respective category of object; processing the image and the set of query embeddings using an object detection neural network to generate object detection data for the image, comprising: processing the image using an image encoding subnetwork of the object detection neural network to generate a set of object embeddings; processing each object embedding using a localization subnetwork to generate localization data defining a corresponding region of the image; and processing: (i) the set of object embeddings, and (ii) the set of query embeddings, using a classification subnetwork to generate, for each object embedding, a respective classification score distribution over the set of query embeddings.

2.

发明公开
OPEN-VOCABULARY OBJECT DETECTION IN IMAGES 审中-公开

公开(公告)号：US20230360365A1

公开(公告)日：2023-11-09

申请号：US18144045

申请日：2023-05-05

Applicant: Google LLC

Inventor： Matthias Johannes Lorenz Minderer , Alexey Alexeevich Gritsenko , Austin Charles Stone , Dirk Weissenborn , Alexey Dosovitskiy , Neil Matthew Tinmouth Houlsby

IPC: G06V10/764 , G06F40/40 , G06V10/82 , G06V10/22 , G06V10/774 , G06V10/776 , G06V10/74

CPC classification number: G06V10/764 , G06F40/40 , G06V10/82 , G06V10/225 , G06V10/774 , G06V10/776 , G06V10/761

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for object detection. In one aspect, a method comprises: obtaining: (i) an image, and (ii) a set of one or more query embeddings, wherein each query embedding represents a respective category of object; processing the image and the set of query embeddings using an object detection neural network to generate object detection data for the image, comprising: processing the image using an image encoding subnetwork of the object detection neural network to generate a set of object embeddings; processing each object embedding using a localization subnetwork to generate localization data defining a corresponding region of the image; and processing: (i) the set of object embeddings, and (ii) the set of query embeddings, using a classification subnetwork to generate, for each object embedding, a respective classification score distribution over the set of query embeddings.

3.

发明申请
OPEN-VOCABULARY OBJECT DETECTION IN IMAGES 有权

公开(公告)号：US20250148759A1

公开(公告)日：2025-05-08

申请号：US19014029

申请日：2025-01-08

Applicant: Google LLC

Inventor： Matthias Johannes Lorenz Minderer , Alexey Alexeevich Gritsenko , Austin Charles Stone , Dirk Weissenborn , Alexey Dosovitskiy , Neil Matthew Tinmouth Houlsby

IPC: G06V10/764 , G06F40/40 , G06V10/22 , G06V10/74 , G06V10/774 , G06V10/776 , G06V10/82

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for object detection. In one aspect, a method comprises: obtaining: (i) an image, and (ii) a set of one or more query embeddings, wherein each query embedding represents a respective category of object; processing the image and the set of query embeddings using an object detection neural network to generate object detection data for the image, comprising: processing the image using an image encoding subnetwork of the object detection neural network to generate a set of object embeddings; processing each object embedding using a localization subnetwork to generate localization data defining a corresponding region of the image; and processing: (i) the set of object embeddings, and (ii) the set of query embeddings, using a classification subnetwork to generate, for each object embedding, a respective classification score distribution over the set of query embeddings.

4.

发明公开
GENERATING VIDEOS USING DIFFUSION MODELS 审中-公开

公开(公告)号：US20240338936A1

公开(公告)日：2024-10-10

申请号：US18296938

申请日：2023-04-06

Applicant: Google LLC

Inventor： Jonathan Ho , Tim Salimans , Alexey Alexeevich Gritsenko , William Chan , Mohammad Norouzi , David James Fleet

IPC: G06V10/82 , G06V10/771 , H04N7/01

CPC classification number: G06V10/82 , G06V10/771 , H04N7/0117 , H04N7/013

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating an output video conditioned on an input. In one aspect, a method comprises receiving the input; initializing a current intermediate representation; generating an output video by updating the current intermediate representation at each of a plurality of iterations, wherein the updating comprises, at each iteration: processing an intermediate input for the iteration comprising the current intermediate representation using a diffusion model that is configured to process the intermediate input to generate a noise output; and updating the current intermediate representation using the noise output for the iteration.

5.

发明授权
Open-vocabulary object detection in images 有权

公开(公告)号：US11928854B2

公开(公告)日：2024-03-12

申请号：US18144045

申请日：2023-05-05

Applicant: Google LLC

Inventor： Matthias Johannes Lorenz Minderer , Alexey Alexeevich Gritsenko , Austin Charles Stone , Dirk Weissenborn , Alexey Dosovitskiy , Neil Matthew Tinmouth Houlsby

IPC: G06K9/00 , G06F40/40 , G06V10/22 , G06V10/74 , G06V10/764 , G06V10/774 , G06V10/776 , G06V10/82

CPC classification number: G06V10/764 , G06F40/40 , G06V10/225 , G06V10/761 , G06V10/774 , G06V10/776 , G06V10/82

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for object detection. In one aspect, a method comprises: obtaining: (i) an image, and (ii) a set of one or more query embeddings, wherein each query embedding represents a respective category of object; processing the image and the set of query embeddings using an object detection neural network to generate object detection data for the image, comprising: processing the image using an image encoding subnetwork of the object detection neural network to generate a set of object embeddings; processing each object embedding using a localization subnetwork to generate localization data defining a corresponding region of the image; and processing: (i) the set of object embeddings, and (ii) the set of query embeddings, using a classification subnetwork to generate, for each object embedding, a respective classification score distribution over the set of query embeddings.

6.

发明申请
Neural Networks based Multimodal Transformer for Multi-Task User Interface Modeling 有权

公开(公告)号：US20230031702A1

公开(公告)日：2023-02-02

申请号：US17812208

申请日：2022-07-13

Applicant: Google LLC

Inventor： Yang Li , Xin Zhou , Gang Li , Mostafa Dehghani , Alexey Alexeevich Gritsenko

IPC: G06V10/82 , G06F3/16 , G06F40/284

Abstract: A method includes receiving, via a computing device, a screenshot of a display provided by a graphical user interface of the computing device. The method also includes generating, by an image-structure transformer of a neural network, a representation by fusing a first embedding based on the screenshot and a second embedding based on a layout of virtual objects in the screenshot. The method additionally includes predicting, by the neural network and based on the generated representation, a modeling task output associated with the graphical user interface. The method further includes providing, by the computing device, the predicted modeling task output.

7.

发明申请
TRAINING SPEECH SYNTHESIS NEURAL NETWORKS USING ENERGY SCORES 有权

公开(公告)号：US20210383790A1

公开(公告)日：2021-12-09

申请号：US17339870

申请日：2021-06-04

Applicant: Google LLC

Inventor： Tim Salimans , Alexey Alexeevich Gritsenko

IPC: G10L13/047 , G10L25/21 , G10L25/51 , G10L25/18 , G10L13/08 , G10L25/30 , G06N3/08

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a generative neural network to convert conditioning text inputs to audio outputs using energy scores.

8.

发明授权
Open-vocabulary object detection in images 有权

公开(公告)号：US12230011B2

公开(公告)日：2025-02-18

申请号：US18422887

申请日：2024-01-25

Applicant: Google LLC

Inventor： Matthias Johannes Lorenz Minderer , Alexey Alexeevich Gritsenko , Austin Charles Stone , Dirk Weissenborn , Alexey Dosovitskiy , Neil Matthew Tinmouth Houlsby

IPC: G06K9/00 , G06F40/40 , G06V10/22 , G06V10/74 , G06V10/764 , G06V10/774 , G06V10/776 , G06V10/82

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for object detection. In one aspect, a method comprises: obtaining: (i) an image, and (ii) a set of one or more query embeddings, wherein each query embedding represents a respective category of object; processing the image and the set of query embeddings using an object detection neural network to generate object detection data for the image, comprising: processing the image using an image encoding subnetwork of the object detection neural network to generate a set of object embeddings; processing each object embedding using a localization subnetwork to generate localization data defining a corresponding region of the image; and processing: (i) the set of object embeddings, and (ii) the set of query embeddings, using a classification subnetwork to generate, for each object embedding, a respective classification score distribution over the set of query embeddings.

9.

发明公开
ACTION LOCALIZATION IN VIDEOS USING LEARNED QUERIES 审中-公开

公开(公告)号：US20240346824A1

公开(公告)日：2024-10-17

申请号：US18634794

申请日：2024-04-12

Applicant: Google LLC

Inventor： Alexey Alexeevich Gritsenko , Xuehan Xiong , Josip Djolonga , Mostafa Dehghani , Chen Sun , Mario Lucic , Cordelia Luise Schmid , Anurag Arnab

IPC: G06V20/40 , G06T7/73 , G06V10/62 , G06V10/764 , G06V10/77 , G06V10/774 , G06V10/776 , G06V10/82

CPC classification number: G06V20/46 , G06T7/73 , G06V10/62 , G06V10/764 , G06V10/7715 , G06V10/774 , G06V10/776 , G06V10/82 , G06T2207/10016 , G06T2207/20081 , G06T2207/20084

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for performing action localization on an input video. In particular, a system maintains a set of query vectors and uses the input video and the set of query vectors to generate an action localization output for the input video. The action localization output includes, for each of one or more agents depicted in the video, data specifying, for each of one or more video frames in the video, a respective bounding box in the video frame that depicts the agent and a respective action from a set of actions that is being performed by the agent in the video frame.

10.

发明授权
Training speech synthesis neural networks using energy scores 有权

公开(公告)号：US12073819B2

公开(公告)日：2024-08-27

申请号：US17339870

申请日：2021-06-04

Applicant: Google LLC

Inventor： Tim Salimans , Alexey Alexeevich Gritsenko

IPC: G10L13/047 , G06N3/08 , G10L13/08 , G10L25/18 , G10L25/21 , G10L25/30 , G10L25/51

CPC classification number: G10L13/047 , G06N3/08 , G10L13/08 , G10L25/18 , G10L25/21 , G10L25/30 , G10L25/51

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a generative neural network to convert conditioning text inputs to audio outputs using energy scores.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification