Visual Transformers with Sparse Application of Video Kernels

    公开(公告)号:US20250005924A1

    公开(公告)日:2025-01-02

    申请号:US18577051

    申请日:2023-11-22

    Applicant: Google LLC

    Abstract: Provided are machine-learned models for performing video processing with improved efficiency. In particular, the machine-learned model can perform the sparse application of one or more video kernels to a set of video data to generate video tokens that can, for example, be provided as input to a visual transformer. Thus, example implementations of the present disclosure are directed to an approach which can turn a visual transformer (e.g., a ViT encoder) into an efficient video model. Furthermore, example implementations described herein can seamlessly work with both image and video inputs. Specifically, by sparsely sampling the inputs, the model is able to do training and inference from both inputs. The proposed model is easily scalable and can optionally be adapted to large-scale pre-trained visual transformers without requiring full finetuning.

    UNSUPERVISED DEPTH PREDICTION NEURAL NETWORKS

    公开(公告)号:US20210319578A1

    公开(公告)日:2021-10-14

    申请号:US17272419

    申请日:2019-09-05

    Applicant: Google LLC

    Abstract: A system for generating a depth output for an image is described. The system receives input images that depict the same scene, each input image including one or more potential objects. The system generates, for each input image, a respective background image and processes the background images to generate a camera motion output that characterizes the motion of the camera between the input images. For each potential object, the system generates a respective object motion output for the potential object based on the input images and the camera motion output. The system processes a particular input image of the input images using a depth prediction neural network (NN) to generate a depth output for the particular input image, and updates the current values of parameters of the depth prediction NN based on the particular depth output, the camera motion output, and the object motion outputs for the potential objects.

    Image depth prediction neural networks

    公开(公告)号:US10929996B2

    公开(公告)日:2021-02-23

    申请号:US16332991

    申请日:2017-09-12

    Applicant: Google LLC

    Abstract: A system includes an image depth prediction neural network implemented by one or more computers. The image depth prediction neural network is a recurrent neural network that is configured to receive a sequence of images and, for each image in the sequence: process the image in accordance with a current internal state of the recurrent neural network to (i) update the current internal state and (ii) generate a depth output that characterizes a predicted depth of a future image in the sequence.

    Dynamic training of Models
    8.
    发明公开

    公开(公告)号:US20240029413A1

    公开(公告)日:2024-01-25

    申请号:US18350845

    申请日:2023-07-12

    Applicant: Google LLC

    CPC classification number: G06V10/774 G06V10/25 G06V2201/07

    Abstract: A method involves the training of a model by dynamically adjusting the number of examples within each training batch. The dynamic adjustment is accomplished by adjusting the number of examples per task within each training batch according to the performance of the model on the tasks that the model is being trained on. In some embodiments, this method is applied to cross-modal vision-language tasks. This model may also be applied to the pre-training of a model that can be later fine-tuned for a more specific task(s).

    Unsupervised learning of image depth and ego-motion prediction neural networks

    公开(公告)号:US11790549B2

    公开(公告)日:2023-10-17

    申请号:US17826849

    申请日:2022-05-27

    Applicant: Google LLC

    Abstract: A system includes a neural network implemented by one or more computers, in which the neural network includes an image depth prediction neural network and a camera motion estimation neural network. The neural network is configured to receive a sequence of images. The neural network is configured to process each image in the sequence of images using the image depth prediction neural network to generate, for each image, a respective depth output that characterizes a depth of the image, and to process a subset of images in the sequence of images using the camera motion estimation neural network to generate a camera motion output that characterizes the motion of a camera between the images in the subset. The image depth prediction neural network and the camera motion estimation neural network have been jointly trained using an unsupervised learning technique.

    Unsupervised depth prediction neural networks

    公开(公告)号:US11783500B2

    公开(公告)日:2023-10-10

    申请号:US17272419

    申请日:2019-09-05

    Applicant: Google LLC

    Abstract: A system for generating a depth output for an image is described. The system receives input images that depict the same scene, each input image including one or more potential objects. The system generates, for each input image, a respective background image and processes the background images to generate a camera motion output that characterizes the motion of the camera between the input images. For each potential object, the system generates a respective object motion output for the potential object based on the input images and the camera motion output. The system processes a particular input image of the input images using a depth prediction neural network (NN) to generate a depth output for the particular input image, and updates the current values of parameters of the depth prediction NN based on the particular depth output, the camera motion output, and the object motion outputs for the potential objects.

Patent Agency Ranking