-
公开(公告)号:US20250111476A1
公开(公告)日:2025-04-03
申请号:US18890544
申请日:2024-09-19
Applicant: NVIDIA Corporation
Inventor: Benjamin David Eckart , Anthea Li , Chao Liu , Kevin Shih , Jan Kautz
IPC: G06T3/4046
Abstract: Parametric distributions of data are one type of data model that can be used for various purposes such as for computer vision tasks that may include classification, segmentation, 3D reconstruction, etc. These parametric distributions of data may be computed from a given data set, which may be unstructured and/or which may include low-dimensional data. Current solutions for learning parametric distributions of data involve explicitly learning kernel parameters. However, this explicit learning approach is not only inefficient in that it requires a high computational cost (i.e. from a large number of floating point operations per second), but it also leaves room for improvement in terms of accuracy of the resulting learned model. The present disclosure provides a neural network architecture that implicitly learns a parametric distribution of data, which can reduce the computational cost while improve accuracy when compared with prior solutions that rely on the explicit learning design.
-
公开(公告)号:US11869149B2
公开(公告)日:2024-01-09
申请号:US17744467
申请日:2022-05-13
Applicant: NVIDIA CORPORATION
Inventor: Ben Eckart , Christopher Choy , Chao Liu , Yurong You
CPC classification number: G06T17/10 , G06N3/045 , G06N3/084 , G06T19/20 , G06T2219/2016
Abstract: In various embodiments, an unsupervised training application executes a neural network on a first point cloud to generate keys and values. The unsupervised training application generates output vectors based on a first query set, the keys, and the values and then computes spatial features based on the output vectors. The unsupervised training application computes quantized context features based on the output vectors and a first set of codes representing a first set of 3D geometry blocks. The unsupervised training application modifies the first neural network based on a likelihood of reconstructing the first point cloud, the quantized context features, and the spatial features to generate an updated neural network. A trained machine learning model includes the updated neural network, a second query set, and a second set of codes representing a second set of 3D geometry blocks and maps a point cloud to a representation of 3D geometry instances.
-
公开(公告)号:US20200160546A1
公开(公告)日:2020-05-21
申请号:US16439539
申请日:2019-06-12
Applicant: NVIDIA Corporation
Inventor: Jinwei Gu , Kihwan Kim , Chao Liu
Abstract: Techniques for estimating depth for a video stream captured by a monocular image sensor are disclosed. A sequence of image frames are captured by the monocular image sensor. A first neural network is configured to process at least a portion of the sequence of image frames to generate a depth probability volume. The depth probability volume includes a plurality of probability maps corresponding to a number of discrete depth candidate locations over a range of depths defined for the scene. The depth probability volume can be updated using a second neural network that is configured to generate adaptive gain parameters to integrate the DPVs over time. A third neural network is configured to refine the updated depth probability volume from a lower resolution to a higher resolution that matches the original resolution of the sequence of image frames. A depth map can be calculated based on the depth probability volume.
-
公开(公告)号:US20240104842A1
公开(公告)日:2024-03-28
申请号:US18472653
申请日:2023-09-22
Applicant: NVIDIA Corporation
Inventor: Koki Nagano , Alexander Trevithick , Chao Liu , Eric Ryan Chan , Sameh Khamis , Michael Stengel , Zhiding Yu
IPC: G06T17/00 , G06T5/20 , G06T7/70 , G06T7/90 , G06V10/771
CPC classification number: G06T17/00 , G06T5/20 , G06T7/70 , G06T7/90 , G06V10/771 , G06T2207/10024
Abstract: A method for generating, by an encoder-based model, a three-dimensional (3D) representation of a two-dimensional (2D) image is provided. The encoder-based model is trained to infer the 3D representation using a synthetic training data set generated by a pre-trained model. The pre-trained model is a 3D generative model that produces a 3D representation and a corresponding 2D rendering, which can be used to train a separate encoder-based model for downstream tasks like estimating a triplane representation, neural radiance field, mesh, depth map, 3D key points, or the like, given a single input image, using the pseudo ground truth 3D synthetic training data set. In a particular embodiment, the encoder-based model is trained to predict a triplane representation of the input image, which can then be rendered by a volume renderer according to pose information to generate an output image of the 3D scene from the corresponding viewpoint.
-
公开(公告)号:US10984545B2
公开(公告)日:2021-04-20
申请号:US16439539
申请日:2019-06-12
Applicant: NVIDIA Corporation
Inventor: Jinwei Gu , Kihwan Kim , Chao Liu
Abstract: Techniques for estimating depth for a video stream captured by a monocular image sensor are disclosed. A sequence of image frames are captured by the monocular image sensor. A first neural network is configured to process at least a portion of the sequence of image frames to generate a depth probability volume. The depth probability volume includes a plurality of probability maps corresponding to a number of discrete depth candidate locations over a range of depths defined for the scene. The depth probability volume can be updated using a second neural network that is configured to generate adaptive gain parameters to integrate the DPVs over time. A third neural network is configured to refine the updated depth probability volume from a lower resolution to a higher resolution that matches the original resolution of the sequence of image frames. A depth map can be calculated based on the depth probability volume.
-
-
-
-