-
公开(公告)号:US20180181809A1
公开(公告)日:2018-06-28
申请号:US15855887
申请日:2017-12-27
Applicant: NVIDIA Corporation
Inventor: Rajeev Ranjan , Shalini De Mello , Jan Kautz
CPC classification number: G06K9/00604 , G06K9/00228 , G06K9/00255 , G06K9/00617 , G06K9/00973 , G06K9/00986 , G06K9/3216 , G06K9/4628 , G06K9/6256 , G06K9/627 , G06N3/04 , G06N3/0454 , G06N3/08 , G06T7/11 , G06T7/70 , G06T7/73 , G06T2200/04 , G06T2207/10024 , G06T2207/20081 , G06T2207/20084 , G06T2207/30201 , G06T2210/52
Abstract: A method, computer readable medium, and system are disclosed for performing unconstrained appearance-based gaze estimation. The method includes the steps of identifying an image of an eye and a head orientation associated with the image of the eye, determining an orientation for the eye by analyzing, within a convolutional neural network (CNN), the image of the eye and the head orientation associated with the image of the eye, and returning the orientation of the eye.
-
102.
公开(公告)号:US20170206405A1
公开(公告)日:2017-07-20
申请号:US15402128
申请日:2017-01-09
Applicant: NVIDIA Corporation
Inventor: Pavlo Molchanov , Xiaodong Yang , Shalini De Mello , Kihwan Kim , Stephen Walter Tyree , Jan Kautz
CPC classification number: G06K9/00355 , G06K9/00201 , G06K9/00765 , G06K9/4628 , G06K9/4652 , G06K9/6251 , G06K9/6256 , G06K9/627 , G06K9/6277 , G06N3/0445 , G06N3/0454 , G06N3/084 , Y04S10/54
Abstract: A method, computer readable medium, and system are disclosed for detecting and classifying hand gestures. The method includes the steps of receiving an unsegmented stream of data associated with a hand gesture, extracting spatio-temporal features from the unsegmented stream by a three-dimensional convolutional neural network (3DCNN), and producing a class label for the hand gesture based on the spatio-temporal features.
-
公开(公告)号:US20250069191A1
公开(公告)日:2025-02-27
申请号:US18452634
申请日:2023-08-21
Applicant: NVIDIA Corporation
Inventor: Iuri Frosio , Mayoore Selvarasa Jaiswal , Jan Kautz , Jianyuan Min
IPC: G06T5/50 , H04N23/743
Abstract: Systems and methods are disclosed related to synthetic bracketing for exposure correction. A deep learning based method and system produces a set of differently exposed images from a single input image. The images in the set may be combined to produce an output image with improved global and local exposure compared with the input image. An image encoder applies learned parameters to each input image to generate a set of image features including local exposure estimates for each of two or more regions of the input image and a low resolution latent representation of the input image. A decoder receives the local exposure estimates, the latent representation, and target enhancements that are processed to generate synthesized transformations. When applied to the input image, the synthesized transformations produce the set of transformed images. Each transformed image is a version of the input image synthesized to correspond to a respective target enhancement.
-
公开(公告)号:US20250045892A1
公开(公告)日:2025-02-06
申请号:US18593742
申请日:2024-03-01
Applicant: NVIDIA Corporation
Inventor: Morteza Mardani , Jiaming Song , Jan Kautz , Arash Vahdat
Abstract: Diffusion models are machine learning algorithms that are uniquely trained to generate high-quality data from an input lower-quality data. For example, they can be trained in the image domain, for example, to perform specific image restoration tasks, such as inpainting (e.g. completing an incomplete image), deblurring (e.g. removing blurring from an image), and super-resolution (e.g. increasing a resolution of an image), or they can be trained to perform image rendering tasks, including 2D-to-3D image generation tasks. However, current approaches to training diffusion models only allow the models to be optimized for a specific task such that they will not achieve high-quality results when used for other tasks. The present disclosure provides a diffusion model that uses variational inferencing to approximate a distribution of data, which allows the diffusion model to universally solve different tasks without having to be re-trained specifically for each task.
-
105.
公开(公告)号:US20240371096A1
公开(公告)日:2024-11-07
申请号:US18312102
申请日:2023-05-04
Applicant: Nvidia Corporation
Inventor: Sameh Khamis , Koki Nagano , Jan Kautz , Sanja Fidler
Abstract: Approaches presented herein provide systems and methods for disentangling identity from expression input models. One or more machine learning systems may be trained directly from three-dimensional (3D) points to develop unique latent codes for expressions associated with different identities. These codes may then be mapped to different identities to independently model an object, such as a face, to generate a new mesh including an expression for an independent identity. A pipeline may include a set of machine learning systems to determine model parameters and also adjust input expression codes using gradient backpropagation in order train models for incorporation into a content development pipeline.
-
公开(公告)号:US20240169563A1
公开(公告)日:2024-05-23
申请号:US18509627
申请日:2023-11-15
Applicant: NVIDIA Corporation
Inventor: Bowen Wen , Jonathan Tremblay , Valts Blukis , Jan Kautz , Stanley Thomas Birchfield
CPC classification number: G06T7/248 , G06T7/11 , G06T7/70 , G06T17/00 , G06T19/006 , G06T2207/10016 , G06T2207/10024 , G06T2207/10028 , G06T2207/20072 , G06T2207/20084 , G06T2207/30252
Abstract: Apparatuses, systems, and techniques for constructing a data structure to store a shape of an object based at least in part on a portion of multiple images, and obtaining poses of the object by tracking a pose of the object through the multiple images based at least in part on the data structure. Optionally, the poses may be used to generate a plan for a path of a device to travel, generate a rendering of at least a portion of a Mixed Reality (“MR”) display to be viewed by a user, and/or the like.
-
公开(公告)号:US11948078B2
公开(公告)日:2024-04-02
申请号:US17000048
申请日:2020-08-21
Applicant: Nvidia Corporation
Inventor: Arash Vahdat , Tanmay Gupta , Xiaodong Yang , Jan Kautz
IPC: G06N3/08 , G06F18/214 , G06F18/22 , G06V10/74 , G06V10/82 , G06V30/19 , G06V30/262
CPC classification number: G06N3/08 , G06F18/2148 , G06F18/22 , G06V10/761 , G06V10/82 , G06V30/1916 , G06V30/19173 , G06V30/274
Abstract: The disclosure provides a framework or system for learning visual representation using a large set of image/text pairs. The disclosure provides, for example, a method of visual representation learning, a joint representation learning system, and an artificial intelligence (AI) system that employs one or more of the trained models from the method or system. The AI system can be used, for example, in autonomous or semi-autonomous vehicles. In one example, the method of visual representation learning includes: (1) receiving a set of image embeddings from an image representation model and a set of text embeddings from a text representation model, and (2) training, employing mutual information, a critic function by learning relationships between the set of image embeddings and the set of text embeddings.
-
公开(公告)号:US20230394781A1
公开(公告)日:2023-12-07
申请号:US18083397
申请日:2022-12-16
Applicant: NVIDIA Corporation
Inventor: Ali Hatamizadeh , Hongxu Yin , Jan Kautz , Pavlo Molchanov
CPC classification number: G06V10/42 , G06V10/44 , G06V10/82 , G06T3/40 , G06V10/7715
Abstract: Vision transformers are deep learning models that employ a self-attention mechanism to obtain feature representations for an input image. To date, the configuration of vision transformers has limited the self-attention computation to a local window of the input image, such that short-range dependencies are modeled in the output. The present disclosure provides a vision transformer that captures global context, and that is therefore able to model long-range dependencies in its output.
-
公开(公告)号:US20230290038A1
公开(公告)日:2023-09-14
申请号:US18320446
申请日:2023-05-19
Applicant: NVIDIA Corporation
Inventor: Xueting Li , Sifei Liu , Kihwan Kim , Shalini De Mello , Jan Kautz
CPC classification number: G06T15/04 , G06T7/579 , G06T7/70 , G06T17/20 , G06T15/20 , G06T2207/30244 , G06T2207/20084 , G06T2207/10016
Abstract: A three-dimensional (3D) object reconstruction neural network system learns to predict a 3D shape representation of an object from a video that includes the object. The 3D reconstruction technique may be used for content creation, such as generation of 3D characters for games, movies, and 3D printing. When 3D characters are generated from video, the content may also include motion of the character, as predicted based on the video. The 3D object construction technique exploits temporal consistency to reconstruct a dynamic 3D representation of the object from an unlabeled video. Specifically, an object in a video has a consistent shape and consistent texture across multiple frames. Texture, base shape, and part correspondence invariance constraints may be applied to fine-tune the neural network system. The reconstruction technique generalizes well—particularly for non-rigid objects.
-
公开(公告)号:US20230252692A1
公开(公告)日:2023-08-10
申请号:US17929182
申请日:2022-09-01
Applicant: NVIDIA Corporation
Inventor: Sifei Liu , Jiteng Mu , Shalini De Mello , Zhiding Yu , Jan Kautz
CPC classification number: G06T11/001 , G06T3/0093
Abstract: Embodiments of the present disclosure relate to learning dense correspondences for images. Systems and methods are disclosed that disentangle structure and texture (or style) representations of GAN synthesized images by learning a dense pixel-level correspondence map for each image during image synthesis. A canonical coordinate frame is defined and a structure latent code for each generated image is warped to align with the canonical coordinate frame. In sum, the structure associated with the latent code is mapped into a shared coordinate space (canonical coordinate space), thereby establishing correspondences in the shared coordinate space. A correspondence generation system receives the warped coordinate correspondences as an encoded image structure. The encoded image structure and a texture latent code are used to synthesize an image. The shared coordinate space enables propagation of semantic labels from reference images to synthesized images.
-
-
-
-
-
-
-
-
-