-
公开(公告)号:US12182940B2
公开(公告)日:2024-12-31
申请号:US17578051
申请日:2022-01-18
Applicant: NVIDIA Corporation
Inventor: Xueting Li , Sifei Liu , Kihwan Kim , Shalini De Mello , Varun Jampani , Jan Kautz
IPC: G06T15/00 , G06F18/21 , G06T7/40 , G06T7/73 , G06T17/20 , G06V10/26 , G06V10/776 , G06V10/82 , G06V20/64
Abstract: Apparatuses, systems, and techniques to identify a shape or camera pose of a three-dimensional object from a two-dimensional image of the object. In at least one embodiment, objects are identified in an image using one or more neural networks that have been trained on objects of a similar category and a three-dimensional mesh template.
-
公开(公告)号:US12175350B2
公开(公告)日:2024-12-24
申请号:US16566797
申请日:2019-09-10
Applicant: NVIDIA Corporation
Inventor: Arash Vahdat , Arun Mohanray Mallya , Ming-Yu Liu , Jan Kautz
Abstract: In at least one embodiment, differentiable neural architecture search and reinforcement learning are combined under one framework to discover network architectures with desired properties such as high accuracy, low latency, or both. In at least one embodiment, an objective function for search based on generalization error prevents the selection of architectures prone to overfitting.
-
公开(公告)号:US20240416963A1
公开(公告)日:2024-12-19
申请号:US18379601
申请日:2023-10-12
Applicant: NVIDIA Corporation
Inventor: Zhiqi Li , Zhiding Yu , David Austin , Shiyi Lan , Jan Kautz , Jose Manuel Alvarez Lopez
Abstract: Apparatuses, systems, and techniques of using one or more machine learning processes (e.g., neural network(s)) to predict occupancy using an image input. In at least one embodiment, image data is processed using a neural network to predict occupancy in a 3D voxel space. In at least one embodiment, image data is processed using a neural network to detect objects in a 3D space.
-
公开(公告)号:US20240185396A1
公开(公告)日:2024-06-06
申请号:US18222725
申请日:2023-07-17
Applicant: NVIDIA CORPORATION
Inventor: Ali Hatamizadeh , Jiaming Song , Jan Kautz , Arash Vahdat
CPC classification number: G06T5/002 , G06T1/20 , G06T7/0002 , G06T2207/20081 , G06T2207/20182
Abstract: Apparatuses, systems, and techniques to generate images. In at least one embodiment, one or more machine learning models generate an output image based, at least in part, on calculating attention scores using time embeddings.
-
公开(公告)号:US11790633B2
公开(公告)日:2023-10-17
申请号:US17365877
申请日:2021-07-01
Applicant: Nvidia Corporation
Inventor: Zhiding Yu , Rui Huang , Wonmin Byeon , Sifei Liu , Guilin Liu , Thomas Breuel , Anima Anandkumar , Jan Kautz
IPC: G06V10/50 , G06N3/04 , G06T7/13 , G06V10/75 , G06F18/2413
CPC classification number: G06V10/50 , G06F18/2413 , G06N3/04 , G06T7/13 , G06V10/758
Abstract: The disclosure provides a learning framework that unifies both semantic segmentation and semantic edge detection. A learnable recurrent message passing layer is disclosed where semantic edges are considered as explicitly learned gating signals to refine segmentation and improve dense prediction quality by finding compact structures for message paths. The disclosure includes a method for coupled segmentation and edge learning. In one example, the method includes: (1) receiving an input image, (2) generating, from the input image, a semantic feature map, an affinity map, and a semantic edge map from a single backbone network of a convolutional neural network (CNN), and (3) producing a refined semantic feature map by smoothing pixels of the semantic feature map using spatial propagation, and controlling the smoothing using both affinity values from the affinity map and edge values from the semantic edge map.
-
公开(公告)号:US11704857B2
公开(公告)日:2023-07-18
申请号:US17734244
申请日:2022-05-02
Applicant: NVIDIA Corporation
Inventor: Xueting Li , Sifei Liu , Kihwan Kim , Shalini De Mello , Jan Kautz
CPC classification number: G06T15/04 , G06T7/579 , G06T7/70 , G06T15/20 , G06T17/20 , G06T2207/10016 , G06T2207/20084 , G06T2207/30244
Abstract: A three-dimensional (3D) object reconstruction neural network system learns to predict a 3D shape representation of an object from a video that includes the object. The 3D reconstruction technique may be used for content creation, such as generation of 3D characters for games, movies, and 3D printing. When 3D characters are generated from video, the content may also include motion of the character, as predicted based on the video. The 3D object construction technique exploits temporal consistency to reconstruct a dynamic 3D representation of the object from an unlabeled video. Specifically, an object in a video has a consistent shape and consistent texture across multiple frames. Texture, base shape, and part correspondence invariance constraints may be applied to fine-tune the neural network system. The reconstruction technique generalizes well—particularly for non-rigid objects.
-
公开(公告)号:US20230144458A1
公开(公告)日:2023-05-11
申请号:US18051209
申请日:2022-10-31
Applicant: NVIDIA Corporation
Inventor: Alexander Malafeev , Shalini De Mello , Jaewoo Seo , Umar Iqbal , Koki Nagano , Jan Kautz , Simon Yuen
CPC classification number: G06V40/174 , G06V40/171 , G06V40/165 , G06V10/82 , G06T13/40
Abstract: In examples, locations of facial landmarks may be applied to one or more machine learning models (MLMs) to generate output data indicating profiles corresponding to facial expressions, such as facial action coding system (FACS) values. The output data may be used to determine geometry of a model. For example, video frames depicting one or more faces may be analyzed to determine the locations. The facial landmarks may be normalized, then be applied to the MLM(s) to infer the profile(s), which may then be used to animate the mode for expression retargeting from the video. The MLM(s) may include sub-networks that each analyze a set of input data corresponding to a region of the face to determine profiles that correspond to the region. The profiles from the sub-networks, along global locations of facial landmarks may be used by a subsequent network to infer the profiles for the overall face.
-
公开(公告)号:US20220391781A1
公开(公告)日:2022-12-08
申请号:US17827446
申请日:2022-05-27
Applicant: NVIDIA Corporation
Inventor: Or Litany , Haggai Maron , David Jesus Acuna Marrero , Jan Kautz , Sanja Fidler , Gal Chechik
Abstract: A method performed by a server is provided. The method comprises sending copies of a set of parameters of a hyper network (HN) to at least one client device, receiving from each client device in the at least one client device, a corresponding set of updated parameters of the HN, and determining a next set of parameters of the HN based on the corresponding sets of updated parameters received from the at least one client device. Each client device generates the corresponding set of updated parameters based on a local model architecture of the client device.
-
公开(公告)号:US11514293B2
公开(公告)日:2022-11-29
申请号:US16564978
申请日:2019-09-09
Applicant: NVIDIA Corporation
Inventor: Ruben Villegas , Alejandro Troccoli , Iuri Frosio , Stephen Tyree , Wonmin Byeon , Jan Kautz
Abstract: In various examples, historical trajectory information of objects in an environment may be tracked by an ego-vehicle and encoded into a state feature. The encoded state features for each of the objects observed by the ego-vehicle may be used—e.g., by a bi-directional long short-term memory (LSTM) network—to encode a spatial feature. The encoded spatial feature and the encoded state feature for an object may be used to predict lateral and/or longitudinal maneuvers for the object, and the combination of this information may be used to determine future locations of the object. The future locations may be used by the ego-vehicle to determine a path through the environment, or may be used by a simulation system to control virtual objects—according to trajectories determined from the future locations—through a simulation environment.
-
公开(公告)号:US11417011B2
公开(公告)日:2022-08-16
申请号:US16897057
申请日:2020-06-09
Applicant: NVIDIA Corporation
Inventor: Umar Iqbal , Pavlo Molchanov , Jan Kautz
Abstract: Learning to estimate a 3D body pose, and likewise the pose of any type of object, from a single 2D image is of great interest for many practical graphics applications and generally relies on neural networks that have been trained with sample data which annotates (labels) each sample 2D image with a known 3D pose. Requiring this labeled training data however has various drawbacks, including for example that traditionally used training data sets lack diversity and therefore limit the extent to which neural networks are able to estimate 3D pose. Expanding these training data sets is also difficult since it requires manually provided annotations for 2D images, which is time consuming and prone to errors. The present disclosure overcomes these and other limitations of existing techniques by providing a model that is trained from unlabeled multi-view data for use in 3D pose estimation.
-
-
-
-
-
-
-
-
-