-
公开(公告)号:US20250111476A1
公开(公告)日:2025-04-03
申请号:US18890544
申请日:2024-09-19
Applicant: NVIDIA Corporation
Inventor: Benjamin David Eckart , Anthea Li , Chao Liu , Kevin Shih , Jan Kautz
IPC: G06T3/4046
Abstract: Parametric distributions of data are one type of data model that can be used for various purposes such as for computer vision tasks that may include classification, segmentation, 3D reconstruction, etc. These parametric distributions of data may be computed from a given data set, which may be unstructured and/or which may include low-dimensional data. Current solutions for learning parametric distributions of data involve explicitly learning kernel parameters. However, this explicit learning approach is not only inefficient in that it requires a high computational cost (i.e. from a large number of floating point operations per second), but it also leaves room for improvement in terms of accuracy of the resulting learned model. The present disclosure provides a neural network architecture that implicitly learns a parametric distribution of data, which can reduce the computational cost while improve accuracy when compared with prior solutions that rely on the explicit learning design.
-
公开(公告)号:US12169882B2
公开(公告)日:2024-12-17
申请号:US17929182
申请日:2022-09-01
Applicant: NVIDIA Corporation
Inventor: Sifei Liu , Jiteng Mu , Shalini De Mello , Zhiding Yu , Jan Kautz
Abstract: Embodiments of the present disclosure relate to learning dense correspondences for images. Systems and methods are disclosed that disentangle structure and texture (or style) representations of GAN synthesized images by learning a dense pixel-level correspondence map for each image during image synthesis. A canonical coordinate frame is defined and a structure latent code for each generated image is warped to align with the canonical coordinate frame. In sum, the structure associated with the latent code is mapped into a shared coordinate space (canonical coordinate space), thereby establishing correspondences in the shared coordinate space. A correspondence generation system receives the warped coordinate correspondences as an encoded image structure. The encoded image structure and a texture latent code are used to synthesize an image. The shared coordinate space enables propagation of semantic labels from reference images to synthesized images.
-
公开(公告)号:US20240185034A1
公开(公告)日:2024-06-06
申请号:US18130648
申请日:2023-04-04
Applicant: NVIDIA Corporation
Inventor: Ali Hatamizadeh , Gregory Heinrich , Hongxu Yin , Jose Manuel Alvarez Lopez , Jan Kautz , Pavlo Molchanov
IPC: G06N3/0455 , G06N3/0464 , G06N3/08
CPC classification number: G06N3/0455 , G06N3/0464 , G06N3/08
Abstract: Apparatuses, systems, and techniques of using one or more machine learning processes (e.g., neural network(s)) to process data (e.g., using hierarchical self-attention). In at least one embodiment, image data is classified using hierarchical self-attention generated using carrier tokens that are associated with windowed subregions of the image data, and local attention generated using local tokens within the windowed subregions and the carrier tokens.
-
公开(公告)号:US11960570B2
公开(公告)日:2024-04-16
申请号:US17412091
申请日:2021-08-25
Applicant: NVIDIA Corporation
Inventor: Taihong Xiao , Sifei Liu , Shalini De Mello , Zhiding Yu , Jan Kautz
IPC: G06F18/00 , G06F18/213 , G06F18/214 , G06N3/08 , G06V10/22 , G06V30/14
CPC classification number: G06F18/2155 , G06F18/213 , G06N3/08 , G06V10/22 , G06V30/1444
Abstract: A multi-level contrastive training strategy for training a neural network relies on image pairs (no other labels) to learn semantic correspondences at the image level and region or pixel level. The neural network is trained using contrasting image pairs including different objects and corresponding image pairs including different views of the same object. Conceptually, contrastive training pulls corresponding image pairs closer and pushes contrasting image pairs apart. An image-level contrastive loss is computed from the outputs (predictions) of the neural network and used to update parameters (weights) of the neural network via backpropagation. The neural network is also trained via pixel-level contrastive learning using only image pairs. Pixel-level contrastive learning receives an image pair, where each image includes an object in a particular category.
-
公开(公告)号:US20240096115A1
公开(公告)日:2024-03-21
申请号:US18243555
申请日:2023-09-07
Applicant: NVIDIA Corporation
Inventor: Pavlo Molchanov , Jan Kautz , Arash Vahdat , Hongxu Yin , Paul Micaelli
CPC classification number: G06V20/597 , G06T7/70 , G06V10/82 , G06V20/70 , G06V40/171 , G06T2207/30201 , G06V2201/07
Abstract: Landmark detection refers to the detection of landmarks within an image or a video, and is used in many computer vision tasks such emotion recognition, face identity verification, hand tracking, gesture recognition, and eye gaze tracking. Current landmark detection methods rely on a cascaded computation through cascaded networks or an ensemble of multiple models, which starts with an initial guess of the landmarks and iteratively produces corrected landmarks which match the input more finely. However, the iterations required by current methods typically increase the training memory cost linearly, and do not have an obvious stopping criteria. Moreover, these methods tend to exhibit jitter in landmark detection results for video. The present disclosure improves current landmark detection methods by providing landmark detection using an iterative neural network. Furthermore, when detecting landmarks in video, the present disclosure provides for a reduction in jitter due to reuse of previous hidden states from previous frames.
-
公开(公告)号:US11907846B2
公开(公告)日:2024-02-20
申请号:US17017597
申请日:2020-09-10
Applicant: NVIDIA CORPORATION
Inventor: Sifei Liu , Shalini De Mello , Varun Jampani , Jan Kautz , Xueting Li
IPC: G06K9/36 , G06N3/084 , G06F18/22 , G06F18/20 , G06F18/214 , G06F18/21 , G06N3/045 , G06T17/00 , G06V10/82
CPC classification number: G06N3/084 , G06F18/214 , G06F18/2163 , G06F18/22 , G06F18/29 , G06N3/045 , G06T17/00 , G06V10/82
Abstract: One embodiment of the present invention sets forth a technique for performing spatial propagation. The technique includes generating a first directed acyclic graph (DAG) by connecting spatially adjacent points included in a set of unstructured points via directed edges along a first direction. The technique also includes applying a first set of neural network layers to one or more images associated with the set of unstructured points to generate (i) a set of features for the set of unstructured points and (ii) a set of pairwise affinities between the spatially adjacent points connected by the directed edges. The technique further includes generating a set of labels for the set of unstructured points by propagating the set of features across the first DAG based on the set of pairwise affinities.
-
公开(公告)号:US20220284232A1
公开(公告)日:2022-09-08
申请号:US17188397
申请日:2021-03-01
Applicant: NVIDIA Corporation
Inventor: Hongxu Yin , Arun Mallya , Arash Vahdat , Jose Manuel Alvarez Lopez , Jan Kautz , Pavlo Molchanov
Abstract: Apparatuses, systems, and techniques to identify one or more images used to train one or more neural networks. In at least one embodiment, one or more images used to train one or more neural networks are identified, based on, for example, one or more labels of one or more objects within the one or more images.
-
公开(公告)号:US11375176B2
公开(公告)日:2022-06-28
申请号:US16780738
申请日:2020-02-03
Applicant: NVIDIA Corporation
Inventor: Hung-Yu Tseng , Shalini De Mello , Jonathan Tremblay , Sifei Liu , Jan Kautz , Stanley Thomas Birchfield
IPC: H04N13/282 , H04N13/268 , G06K9/62 , G06N3/08
Abstract: When an image is projected from 3D, the viewpoint of objects in the image, relative to the camera, must be determined. Since the image itself will not have sufficient information to determine the viewpoint of the various objects in the image, techniques to estimate the viewpoint must be employed. To date, neural networks have been used to infer such viewpoint estimates on an object category basis, but must first be trained with numerous examples that have been manually created. The present disclosure provides a neural network that is trained to learn, from just a few example images, a unique viewpoint estimation network capable of inferring viewpoint estimations for a new object category.
-
公开(公告)号:US20220139037A1
公开(公告)日:2022-05-05
申请号:US17578051
申请日:2022-01-18
Applicant: NVIDIA Corporation
Inventor: Xueting Li , Sifei Liu , Kihwan Kim , Shalini De Mello , Varun Jampani , Jan Kautz
Abstract: Apparatuses, systems, and techniques to identify a shape or camera pose of a three-dimensional object from a two-dimensional image of the object. In at least one embodiment, objects are identified in an image using one or more neural networks that have been trained on objects of a similar category and a three-dimensional mesh template.
-
公开(公告)号:US11315018B2
公开(公告)日:2022-04-26
申请号:US15786406
申请日:2017-10-17
Applicant: NVIDIA Corporation
Inventor: Pavlo Molchanov , Stephen Walter Tyree , Tero Tapani Karras , Timo Oskari Aila , Jan Kautz
Abstract: A method, computer readable medium, and system are disclosed for neural network pruning. The method includes the steps of receiving first-order gradients of a cost function relative to layer parameters for a trained neural network and computing a pruning criterion for each layer parameter based on the first-order gradient corresponding to the layer parameter, where the pruning criterion indicates an importance of each neuron that is included in the trained neural network and is associated with the layer parameter. The method includes the additional steps of identifying at least one neuron having a lowest importance and removing the at least one neuron from the trained neural network to produce a pruned neural network.
-
-
-
-
-
-
-
-
-