-
公开(公告)号:US20250005924A1
公开(公告)日:2025-01-02
申请号:US18577051
申请日:2023-11-22
Applicant: Google LLC
Inventor: Anthony J. Piergiovanni , Wei-Cheng Kuo , Anelia Angelova
IPC: G06V20/40 , G06V10/776 , G06V10/82
Abstract: Provided are machine-learned models for performing video processing with improved efficiency. In particular, the machine-learned model can perform the sparse application of one or more video kernels to a set of video data to generate video tokens that can, for example, be provided as input to a visual transformer. Thus, example implementations of the present disclosure are directed to an approach which can turn a visual transformer (e.g., a ViT encoder) into an efficient video model. Furthermore, example implementations described herein can seamlessly work with both image and video inputs. Specifically, by sparsely sampling the inputs, the model is able to do training and inference from both inputs. The proposed model is easily scalable and can optionally be adapted to large-scale pre-trained visual transformers without requiring full finetuning.
-
公开(公告)号:US20240355109A1
公开(公告)日:2024-10-24
申请号:US18746977
申请日:2024-06-18
Applicant: Google LLC
Inventor: Michael Sahngwon Ryoo , Anthony Jacob Piergiovanni , Mingxing Tan , Anelia Angelova
IPC: G06V10/82 , G06N3/045 , G06T1/20 , G06T3/4046 , G06T7/207 , G06V10/776
CPC classification number: G06V10/82 , G06N3/045 , G06T1/20 , G06T3/4046 , G06T7/207 , G06V10/776 , G06T2207/10016 , G06T2207/20081 , G06T2207/20084
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for determining one or more neural network architectures of a neural network for performing a video processing neural network task. In one aspect, a method comprises: at each of a plurality of iterations: selecting a parent neural network architecture from a set of neural network architectures; training a neural network having the parent neural network architecture to perform the video processing neural network task, comprising determining trained values of connection weight parameters of the parent neural network architecture; generating a new neural network architecture based at least in part on the trained values of the connection weight parameters of the parent neural network architecture; and adding the new neural network architecture to the set of neural network architectures.
-
公开(公告)号:US20240037926A1
公开(公告)日:2024-02-01
申请号:US18379532
申请日:2023-10-12
Applicant: Google LLC
Inventor: Weicheng Kuo , Anelia Angelova , Tsung-Yi Lin
IPC: G06V10/82 , G06V10/26 , G06V10/25 , G06V20/10 , G06V10/764 , G06V10/77 , G06V10/44 , G06T7/10 , G06V10/774
CPC classification number: G06V10/82 , G06V10/26 , G06V10/25 , G06V20/10 , G06V10/764 , G06V10/7715 , G06V10/454 , G06T7/10 , G06V10/774 , G06T2207/20081
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing instance segmentation by detecting and segmenting individual objects in an image. In one aspect, a method comprises: processing an image to generate data identifying a region of the image that depicts a particular object; obtaining data defining a plurality of example object segmentations; generating a respective weight value for each of the example object segmentations; for each of a plurality of pixels in the region of the image, determining a score characterizing a likelihood that the pixel is included in the particular object depicted in the region of the image using: (i) the example object segmentations, and (ii) the weight values for the example object segmentations; and generating a segmentation of the particular object depicted in the region of the image using the scores for the pixels in the region of the image.
-
公开(公告)号:US11823443B2
公开(公告)日:2023-11-21
申请号:US17290814
申请日:2019-08-14
Applicant: Google LLC
Inventor: Weicheng Kuo , Anelia Angelova , Tsung-Yi Lin
IPC: G06V10/82 , G06T7/10 , G06V10/26 , G06V10/25 , G06V20/10 , G06V10/764 , G06V10/77 , G06V10/44 , G06V10/774
CPC classification number: G06V10/82 , G06T7/10 , G06V10/25 , G06V10/26 , G06V10/454 , G06V10/764 , G06V10/774 , G06V10/7715 , G06V20/10 , G06T2207/20081
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing instance segmentation by detecting and segmenting individual objects in an image. In one aspect, a method comprises: processing an image to generate data identifying a region of the image that depicts a particular object; obtaining data defining a plurality of example object segmentations; generating a respective weight value for each of the example object segmentations; for each of a plurality of pixels in the region of the image, determining a score characterizing a likelihood that the pixel is included in the particular object depicted in the region of the image using: (i) the example object segmentations, and (ii) the weight values for the example object segmentations; and generating a segmentation of the particular object depicted in the region of the image using the scores for the pixels in the region of the image.
-
公开(公告)号:US20210319578A1
公开(公告)日:2021-10-14
申请号:US17272419
申请日:2019-09-05
Applicant: Google LLC
Inventor: Vincent Michael Casser , Soeren Pirk , Reza Mahjourian , Anelia Angelova
Abstract: A system for generating a depth output for an image is described. The system receives input images that depict the same scene, each input image including one or more potential objects. The system generates, for each input image, a respective background image and processes the background images to generate a camera motion output that characterizes the motion of the camera between the input images. For each potential object, the system generates a respective object motion output for the potential object based on the input images and the camera motion output. The system processes a particular input image of the input images using a depth prediction neural network (NN) to generate a depth output for the particular input image, and updates the current values of parameters of the depth prediction NN based on the particular depth output, the camera motion output, and the object motion outputs for the potential objects.
-
公开(公告)号:US10929996B2
公开(公告)日:2021-02-23
申请号:US16332991
申请日:2017-09-12
Applicant: Google LLC
Inventor: Anelia Angelova , Martin Wicke , Reza Mahjourian
Abstract: A system includes an image depth prediction neural network implemented by one or more computers. The image depth prediction neural network is a recurrent neural network that is configured to receive a sequence of images and, for each image in the sequence: process the image in accordance with a current internal state of the recurrent neural network to (i) update the current internal state and (ii) generate a depth output that characterizes a predicted depth of a future image in the sequence.
-
公开(公告)号:US20240257510A1
公开(公告)日:2024-08-01
申请号:US18289725
申请日:2021-08-06
Applicant: GOOGLE LLC
Inventor: Weicheng Kuo , Tsung-Yi Lin , Anelia Angelova , Dahun Kim
IPC: G06V10/82 , G06V10/44 , G06V10/774 , G06V10/776 , G06V20/00
CPC classification number: G06V10/82 , G06V10/44 , G06V10/774 , G06V10/776 , G06V20/00
Abstract: An object localization network (OLN) can be used to localize object(s) (e.g., known and/or unknown object(s)) in an instance of vision data. Various implementations include detecting the localized object(s) based on the localization. Many implementations include processing the instance of vision data using the OLN to generate a objectness score (e.g., a centerness score) as well as an intersection of union (IoU) score for one or more proposed object locations in the instance of vision data. Object(s) can be localized in the instance of vision data based on the objectness scores and the IoU scores.
-
公开(公告)号:US20240029413A1
公开(公告)日:2024-01-25
申请号:US18350845
申请日:2023-07-12
Applicant: Google LLC
Inventor: Anthony Jacob Piergiovanni , Weiching Kuo , Wei Li , Anelia Angelova
IPC: G06V10/774 , G06V10/25
CPC classification number: G06V10/774 , G06V10/25 , G06V2201/07
Abstract: A method involves the training of a model by dynamically adjusting the number of examples within each training batch. The dynamic adjustment is accomplished by adjusting the number of examples per task within each training batch according to the performance of the model on the tasks that the model is being trained on. In some embodiments, this method is applied to cross-modal vision-language tasks. This model may also be applied to the pre-training of a model that can be later fine-tuned for a more specific task(s).
-
公开(公告)号:US11790549B2
公开(公告)日:2023-10-17
申请号:US17826849
申请日:2022-05-27
Applicant: Google LLC
Inventor: Reza Mahjourian , Martin Wicke , Anelia Angelova
CPC classification number: G06T7/579 , G06N3/045 , G06N3/084 , G06T7/285 , G06T2207/20081
Abstract: A system includes a neural network implemented by one or more computers, in which the neural network includes an image depth prediction neural network and a camera motion estimation neural network. The neural network is configured to receive a sequence of images. The neural network is configured to process each image in the sequence of images using the image depth prediction neural network to generate, for each image, a respective depth output that characterizes a depth of the image, and to process a subset of images in the sequence of images using the camera motion estimation neural network to generate a camera motion output that characterizes the motion of a camera between the images in the subset. The image depth prediction neural network and the camera motion estimation neural network have been jointly trained using an unsupervised learning technique.
-
公开(公告)号:US11783500B2
公开(公告)日:2023-10-10
申请号:US17272419
申请日:2019-09-05
Applicant: Google LLC
Inventor: Vincent Michael Casser , Soeren Pirk , Reza Mahjourian , Anelia Angelova
CPC classification number: G06T7/55 , G06N3/045 , G06N3/088 , G06T3/0093 , G06T7/248 , G06T2207/20081 , G06T2207/20084
Abstract: A system for generating a depth output for an image is described. The system receives input images that depict the same scene, each input image including one or more potential objects. The system generates, for each input image, a respective background image and processes the background images to generate a camera motion output that characterizes the motion of the camera between the input images. For each potential object, the system generates a respective object motion output for the potential object based on the input images and the camera motion output. The system processes a particular input image of the input images using a depth prediction neural network (NN) to generate a depth output for the particular input image, and updates the current values of parameters of the depth prediction NN based on the particular depth output, the camera motion output, and the object motion outputs for the potential objects.
-
-
-
-
-
-
-
-
-