DEPTH ESTIMATION BASED ON FEATURE RECONSTRUCTION WITH ADAPTIVE MASKING AND MOTION PREDICTION

    公开(公告)号:US20250148633A1

    公开(公告)日:2025-05-08

    申请号:US18666502

    申请日:2024-05-16

    Abstract: Systems and techniques are provided for generating depth information. For example, a process can include obtaining a first feature volume including visual features corresponding to each respective frame included in a first set of frames. A first query generator network can generate reconstruction features associated with a reconstructed feature volume corresponding to the first feature volume. Based on the first feature volume, a second query generator network can generate motion features associated with predicted future motion corresponding to the first feature volume. An initial depth prediction can be generated for each respective frame based on cross-attention between features of a depth prediction decoder, the reconstruction features, and the motion features. A refined depth prediction can be generated for each respective based on cross-attention between the initial depth prediction, the reconstruction features, and the motion features.

    THREE-DIMENSIONAL OBJECT PART SEGMENTATION USING A MACHINE LEARNING MODEL

    公开(公告)号:US20240144589A1

    公开(公告)日:2024-05-02

    申请号:US18177028

    申请日:2023-03-01

    Abstract: Systems and techniques are provided for part segmentation. For example, a process for performing part segmentation can include obtaining a three-dimensional capture of an object. The method can include generating one or more two-dimensional images of the object from the three-dimensional capture of the object. The method can further include processing the one or more two-dimensional images of the object to generate at least one two-dimensional bounding box associated with a part of the object. The method can include performing three-dimensional part segmentation of the part of the object based on a three-dimensional point cloud generated from the one or more two-dimensional images of the object and the at least one two-dimensional bounding box and based on semantically labeled super points which are merged into subgroups associated with the part of the object.

    PLANAR MESH RECONSTRUCTION USING IMAGES FROM MULTIPLE CAMERA POSES

    公开(公告)号:US20240386650A1

    公开(公告)日:2024-11-21

    申请号:US18509113

    申请日:2023-11-14

    Abstract: Systems and techniques are provided for processing image data corresponding to a scene. A process can include generating a planar distance map including a planar distance value for each pixel of at least one image corresponding to the scene. Planar segmentation is performed based on the planar distance map, a normal map corresponding to the at least one image, and positional encoding information of the planar distance map. A triangular mesh fragment is initialized based on sampling points from each planar segment of a plurality of planar segments from the planar segmentation. Ray-triangle intersections are determined based on performing ray casting for a reconstructed planar mesh including a plurality of triangular mesh fragments each corresponding to a different image. A planar reconstruction and segmentation machine learning network is optimized for the scene, based on training the planar reconstruction and segmentation machine learning network using one or more loss functions.

    TRANSFORMER-BASED ARCHITECTURE FOR TRANSFORM CODING OF MEDIA

    公开(公告)号:US20230100413A1

    公开(公告)日:2023-03-30

    申请号:US17486732

    申请日:2021-09-27

    Abstract: Systems and techniques are described herein for processing media data using a neural network system. For instance, a process can include obtaining a latent representation of a frame of encoded image data and generating, by a plurality of decoder transformer layers of a decoder sub-network using the latent representation of the frame of encoded image data as input, a frame of decoded image data. At least one decoder transformer layer of the plurality of decoder transformer layers includes: one or more transformer blocks for generating one or more patches of features and determine self-attention locally within one or more window partitions and shifted window partitions applied over the one or more patches; and a patch un-merging engine for decreasing a respective size of each patch of the one or more patches.

    DEPTH COMPLETION USING ATTENTION-BASED REFINEMENT OF FEATURES

    公开(公告)号:US20250148628A1

    公开(公告)日:2025-05-08

    申请号:US18633302

    申请日:2024-04-11

    Abstract: Systems and techniques are provided for generating depth information from one or more images. For example, a process can include obtaining a first depth map corresponding to an input comprising an image of the one or more images and a sparse depth measurement. A three-dimensional (3D) point cloud can be generated based on the first depth map and multi-scale visual features of the input, wherein the 3D point cloud includes a plurality of 3D point features uplifted from the multi-scale visual features. At least a portion of the plurality of 3D point features can be processed using one or more self-attention layers to generate refined 3D point features. A two-dimensional (2D) projection of the refined 3D point features can be generated and a second depth map can be generated based on the 2D projection of the refined 3D point features.

    PHYSICALLY-BASED EMITTER ESTIMATION FOR INDOOR SCENES

    公开(公告)号:US20240303913A1

    公开(公告)日:2024-09-12

    申请号:US18180797

    申请日:2023-03-08

    CPC classification number: G06T15/506 G06T7/593

    Abstract: Systems and techniques are provided for physical-based light estimation for inverse rendering of indoor scenes. For example, a computing device can obtain an estimated scene geometry based on a multi-view observation of a scene. The computing device can further obtain a light emission mask based on the multi-view observation of the scene. The computing device can also obtain an emitted radiance field based on the multi-view observation of the scene. The computing device can then determine, based on the light emission mask and the emitted radiance field, a geometry of at least one light source of the estimated scene geometry.

Patent Agency Ranking