-
公开(公告)号:US12182940B2
公开(公告)日:2024-12-31
申请号:US17578051
申请日:2022-01-18
Applicant: NVIDIA Corporation
Inventor: Xueting Li , Sifei Liu , Kihwan Kim , Shalini De Mello , Varun Jampani , Jan Kautz
IPC: G06T15/00 , G06F18/21 , G06T7/40 , G06T7/73 , G06T17/20 , G06V10/26 , G06V10/776 , G06V10/82 , G06V20/64
Abstract: Apparatuses, systems, and techniques to identify a shape or camera pose of a three-dimensional object from a two-dimensional image of the object. In at least one embodiment, objects are identified in an image using one or more neural networks that have been trained on objects of a similar category and a three-dimensional mesh template.
-
公开(公告)号:US11790633B2
公开(公告)日:2023-10-17
申请号:US17365877
申请日:2021-07-01
Applicant: Nvidia Corporation
Inventor: Zhiding Yu , Rui Huang , Wonmin Byeon , Sifei Liu , Guilin Liu , Thomas Breuel , Anima Anandkumar , Jan Kautz
IPC: G06V10/50 , G06N3/04 , G06T7/13 , G06V10/75 , G06F18/2413
CPC classification number: G06V10/50 , G06F18/2413 , G06N3/04 , G06T7/13 , G06V10/758
Abstract: The disclosure provides a learning framework that unifies both semantic segmentation and semantic edge detection. A learnable recurrent message passing layer is disclosed where semantic edges are considered as explicitly learned gating signals to refine segmentation and improve dense prediction quality by finding compact structures for message paths. The disclosure includes a method for coupled segmentation and edge learning. In one example, the method includes: (1) receiving an input image, (2) generating, from the input image, a semantic feature map, an affinity map, and a semantic edge map from a single backbone network of a convolutional neural network (CNN), and (3) producing a refined semantic feature map by smoothing pixels of the semantic feature map using spatial propagation, and controlling the smoothing using both affinity values from the affinity map and edge values from the semantic edge map.
-
公开(公告)号:US11704857B2
公开(公告)日:2023-07-18
申请号:US17734244
申请日:2022-05-02
Applicant: NVIDIA Corporation
Inventor: Xueting Li , Sifei Liu , Kihwan Kim , Shalini De Mello , Jan Kautz
CPC classification number: G06T15/04 , G06T7/579 , G06T7/70 , G06T15/20 , G06T17/20 , G06T2207/10016 , G06T2207/20084 , G06T2207/30244
Abstract: A three-dimensional (3D) object reconstruction neural network system learns to predict a 3D shape representation of an object from a video that includes the object. The 3D reconstruction technique may be used for content creation, such as generation of 3D characters for games, movies, and 3D printing. When 3D characters are generated from video, the content may also include motion of the character, as predicted based on the video. The 3D object construction technique exploits temporal consistency to reconstruct a dynamic 3D representation of the object from an unlabeled video. Specifically, an object in a video has a consistent shape and consistent texture across multiple frames. Texture, base shape, and part correspondence invariance constraints may be applied to fine-tune the neural network system. The reconstruction technique generalizes well—particularly for non-rigid objects.
-
公开(公告)号:US10762425B2
公开(公告)日:2020-09-01
申请号:US16134716
申请日:2018-09-18
Applicant: NVIDIA Corporation
Inventor: Sifei Liu , Shalini De Mello , Jinwei Gu , Ming-Hsuan Yang , Jan Kautz
Abstract: A spatial linear propagation network (SLPN) system learns the affinity matrix for vision tasks. An affinity matrix is a generic matrix that defines the similarity of two points in space. The SLPN system is trained for a particular computer vision task and refines an input map (i.e., affinity matrix) that indicates pixels the share a particular property (e.g., color, object, texture, shape, etc.). Inputs to the SLPN system are input data (e.g., pixel values for an image) and the input map corresponding to the input data to be propagated. The input data is processed to produce task-specific affinity values (guidance data). The task-specific affinity values are applied to values in the input map, with at least two weighted values from each column contributing to a value in the refined map data for the adjacent column.
-
公开(公告)号:US20240404174A1
公开(公告)日:2024-12-05
申请号:US18653723
申请日:2024-05-02
Applicant: NVIDIA Corporation
Inventor: Xueting Li , Shalini De Mello , Sifei Liu , Koki Nagano , Umar Iqbal , Jan Kautz
Abstract: Systems and methods are disclosed that animate a source portrait image with motion (i.e., pose and expression) from a target image. In contrast to conventional systems, given an unseen single-view portrait image, an implicit three-dimensional (3D) head avatar is constructed that not only captures photo-realistic details within and beyond the face region, but also is readily available for animation without requiring further optimization during inference. In an embodiment, three processing branches of a system produce three tri-planes representing coarse 3D geometry for the head avatar, detailed appearance of a source image, as well as the expression of a target image. By applying volumetric rendering to a combination of the three tri-planes, an image of the desired identity, expression and pose is generated.
-
公开(公告)号:US20240338871A1
公开(公告)日:2024-10-10
申请号:US18746911
申请日:2024-06-18
Applicant: NVIDIA Corporation
Inventor: Donghoom LEE , Sifei Liu , Jinwei Gu , Ming-Yu Liu , Jan Kautz
CPC classification number: G06T11/60 , G06F18/217 , G06F18/24 , G06T3/02 , G06T7/30 , G06V30/274 , G06T7/70 , G06T2207/20081 , G06T2207/20084 , G06T2210/12
Abstract: One embodiment of a method includes applying a first generator model to a semantic representation of an image to generate an affine transformation, where the affine transformation represents a bounding box associated with at least one region within the image. The method further includes applying a second generator model to the affine transformation and the semantic representation to generate a shape of an object. The method further includes inserting the object into the image based on the bounding box and the shape.
-
公开(公告)号:US11748887B2
公开(公告)日:2023-09-05
申请号:US16378464
申请日:2019-04-08
Applicant: NVIDIA Corporation
Inventor: Varun Jampani , Wei-Chih Hung , Sifei Liu , Pavlo Molchanov , Jan Kautz
IPC: G06V10/00 , G06T7/11 , G06T7/143 , G06F17/15 , G06N3/088 , G06F18/40 , G06N3/045 , G06N3/047 , G06V10/764 , G06V10/82 , G06V10/94 , G06V20/40
CPC classification number: G06T7/11 , G06F17/15 , G06F18/40 , G06N3/045 , G06N3/047 , G06N3/088 , G06T7/143 , G06V10/764 , G06V10/82 , G06V10/945 , G06V20/41
Abstract: Systems and methods to detect one or more segments of one or more objects within one or more images based, at least in part, on a neural network trained in an unsupervised manner to infer the one or more segments. Systems and methods to help train one or more neural networks to detect one or more segments of one or more objects within one or more images in an unsupervised manner.
-
公开(公告)号:US11594006B2
公开(公告)日:2023-02-28
申请号:US16998914
申请日:2020-08-20
Applicant: NVIDIA Corporation
Inventor: Xiaodong Yang , Xitong Yang , Sifei Liu , Jan Kautz
Abstract: There are numerous features in video that can be detected using computer-based systems, such as objects and/or motion. The detection of these features, and in particular the detection of motion, has many useful applications, such as action recognition, activity detection, object tracking, etc. The present disclosure provides a neural network that learns motion from unlabeled video frames. In particular, the neural network uses the unlabeled video frames to perform self-supervised hierarchical motion learning. The present disclosure also describes how the learned motion can be used in video action recognition.
-
公开(公告)号:US20220036635A1
公开(公告)日:2022-02-03
申请号:US16945455
申请日:2020-07-31
Applicant: NVIDIA Corporation
Inventor: Xueting Li , Sifei Liu , Kihwan Kim , Shalini De Mello , Jan Kautz
Abstract: A three-dimensional (3D) object reconstruction neural network system learns to predict a 3D shape representation of an object from a video that includes the object. The 3D reconstruction technique may be used for content creation, such as generation of 3D characters for games, movies, and 3D printing. When 3D characters are generated from video, the content may also include motion of the character, as predicted based on the video. The 3D object construction technique exploits temporal consistency to reconstruct a dynamic 3D representation of the object from an unlabeled video. Specifically, an object in a video has a consistent shape and consistent texture across multiple frames. Texture, base shape, and part correspondence invariance constraints may be applied to fine-tune the neural network system. The reconstruction technique generalizes well—particularly for non-rigid objects.
-
10.
公开(公告)号:US20210150757A1
公开(公告)日:2021-05-20
申请号:US16690015
申请日:2019-11-20
Applicant: NVIDIA Corporation
Inventor: Siva Karthik Mustikovela , Varun Jampani , Shalini De Mello , Sifei Liu , Umar Iqbal , Jan Kautz
Abstract: Apparatuses, systems, and techniques to identify orientations of objects within images. In at least one embodiment, one or more neural networks are trained to identify an orientations of one or more objects based, at least in part, on one or more characteristics of the object other than the object's orientation.
-
-
-
-
-
-
-
-
-