-
公开(公告)号:US12169882B2
公开(公告)日:2024-12-17
申请号:US17929182
申请日:2022-09-01
Applicant: NVIDIA Corporation
Inventor: Sifei Liu , Jiteng Mu , Shalini De Mello , Zhiding Yu , Jan Kautz
Abstract: Embodiments of the present disclosure relate to learning dense correspondences for images. Systems and methods are disclosed that disentangle structure and texture (or style) representations of GAN synthesized images by learning a dense pixel-level correspondence map for each image during image synthesis. A canonical coordinate frame is defined and a structure latent code for each generated image is warped to align with the canonical coordinate frame. In sum, the structure associated with the latent code is mapped into a shared coordinate space (canonical coordinate space), thereby establishing correspondences in the shared coordinate space. A correspondence generation system receives the warped coordinate correspondences as an encoded image structure. The encoded image structure and a texture latent code are used to synthesize an image. The shared coordinate space enables propagation of semantic labels from reference images to synthesized images.
-
公开(公告)号:US11960570B2
公开(公告)日:2024-04-16
申请号:US17412091
申请日:2021-08-25
Applicant: NVIDIA Corporation
Inventor: Taihong Xiao , Sifei Liu , Shalini De Mello , Zhiding Yu , Jan Kautz
IPC: G06F18/00 , G06F18/213 , G06F18/214 , G06N3/08 , G06V10/22 , G06V30/14
CPC classification number: G06F18/2155 , G06F18/213 , G06N3/08 , G06V10/22 , G06V30/1444
Abstract: A multi-level contrastive training strategy for training a neural network relies on image pairs (no other labels) to learn semantic correspondences at the image level and region or pixel level. The neural network is trained using contrasting image pairs including different objects and corresponding image pairs including different views of the same object. Conceptually, contrastive training pulls corresponding image pairs closer and pushes contrasting image pairs apart. An image-level contrastive loss is computed from the outputs (predictions) of the neural network and used to update parameters (weights) of the neural network via backpropagation. The neural network is also trained via pixel-level contrastive learning using only image pairs. Pixel-level contrastive learning receives an image pair, where each image includes an object in a particular category.
-
公开(公告)号:US20240095534A1
公开(公告)日:2024-03-21
申请号:US18243348
申请日:2023-09-07
Applicant: NVIDIA Corporation
Inventor: Anima Anandkumar , Chaowei Xiao , Weili Nie , De-An Huang , Zhiding Yu , Manli Shu
Abstract: Apparatuses, systems, and techniques to perform neural networks. In at least one embodiment, a most consistent output of one or more pre-trained neural networks is to be selected. In at least one embodiment, a most consistent output of one or more pre-trained neural networks is to be selected based, at least in part, on a plurality of variances of one or more inputs to the one or more neural networks.
-
公开(公告)号:US20220292306A1
公开(公告)日:2022-09-15
申请号:US17201816
申请日:2021-03-15
Applicant: NVIDIA Corporation
Inventor: Subhashree Radhakrishnan , Partha Sriram , Farzin Aghdasi , Seunghwan Cha , Zhiding Yu
Abstract: In various examples, training methods as described to generate a trained neural network that is robust to various environmental features. In an embodiment, training includes modifying images of a dataset and generating boundary boxes and/or other segmentation information for the modified images which is used to train a neural network.
-
公开(公告)号:US20240265690A1
公开(公告)日:2024-08-08
申请号:US18544840
申请日:2023-12-19
Applicant: NVIDIA Corporation
Inventor: Animashree Anandkumar , Linxi Fan , Zhiding Yu , Chaowei Xiao , Shikun Liu
CPC classification number: G06V10/82 , G06V10/811
Abstract: A vision-language model learns skills and domain knowledge via distinct and separate task-specific neural networks, referred to as experts. Each expert is independently optimized for a specific task, facilitating the use of domain-specific data and architectures that are not feasible with a single large neural network trained for multiple tasks. The vision-language model implemented as an ensemble of pre-trained experts and is more efficiently trained compared with the single large neural network. During training, the vision-language model integrates specialized skills and domain knowledge, rather than trying to simultaneously learn multiple tasks, resulting in effective multi-modal learning.
-
公开(公告)号:US11899749B2
公开(公告)日:2024-02-13
申请号:US17201816
申请日:2021-03-15
Applicant: NVIDIA Corporation
Inventor: Subhashree Radhakrishnan , Partha Sriram , Farzin Aghdasi , Seunghwan Cha , Zhiding Yu
CPC classification number: G06F18/214 , G06T3/0006 , G06T7/12 , G06V10/22 , G06V10/242 , G06V20/40 , G06T2207/20081 , G06T2207/20084
Abstract: In various examples, training methods as described to generate a trained neural network that is robust to various environmental features. In an embodiment, training includes modifying images of a dataset and generating boundary boxes and/or other segmentation information for the modified images which is used to train a neural network.
-
7.
公开(公告)号:US20230376849A1
公开(公告)日:2023-11-23
申请号:US18318212
申请日:2023-05-16
Applicant: NVIDIA Corporation
Inventor: Rafid Reza Mahmood , Marc Law , James Robert Lucas , Zhiding Yu , Jose Manuel Alvarez Lopez , Sanja Fidler
IPC: G06N20/00
CPC classification number: G06N20/00
Abstract: In various examples, estimating optimal training data set sizes for machine learning model systems and applications. Systems and methods are disclosed that estimate an amount of data to include in a training data set, where the training data set is then used to train one or more machine learning models to reach a target validation performance. To estimate the amount of training data, subsets of an initial training data set may be used to train the machine learning model(s) in order to determine estimates for the minimum amount of training data needed to train the machine learning model(s) to reach the target validation performance. The estimates may then be used to generate one or more functions, such as a cumulative density function and/or a probability density function, wherein the function(s) is then used to estimate the amount of training data needed to train the machine learning model(s).
-
公开(公告)号:US20230252692A1
公开(公告)日:2023-08-10
申请号:US17929182
申请日:2022-09-01
Applicant: NVIDIA Corporation
Inventor: Sifei Liu , Jiteng Mu , Shalini De Mello , Zhiding Yu , Jan Kautz
CPC classification number: G06T11/001 , G06T3/0093
Abstract: Embodiments of the present disclosure relate to learning dense correspondences for images. Systems and methods are disclosed that disentangle structure and texture (or style) representations of GAN synthesized images by learning a dense pixel-level correspondence map for each image during image synthesis. A canonical coordinate frame is defined and a structure latent code for each generated image is warped to align with the canonical coordinate frame. In sum, the structure associated with the latent code is mapped into a shared coordinate space (canonical coordinate space), thereby establishing correspondences in the shared coordinate space. A correspondence generation system receives the warped coordinate correspondences as an encoded image structure. The encoded image structure and a texture latent code are used to synthesize an image. The shared coordinate space enables propagation of semantic labels from reference images to synthesized images.
-
公开(公告)号:US20230074706A1
公开(公告)日:2023-03-09
申请号:US17412091
申请日:2021-08-25
Applicant: NVIDIA Corporation
Inventor: Taihong Xiao , Sifei Liu , Shalini De Mello , Zhiding Yu , Jan Kautz
Abstract: A multi-level contrastive training strategy for training a neural network relies on image pairs (no other labels) to learn semantic correspondences at the image level and region or pixel level. The neural network is trained using contrasting image pairs including different objects and corresponding image pairs including different views of the same object. Conceptually, contrastive training pulls corresponding image pairs closer and pushes contrasting image pairs apart. An image-level contrastive loss is computed from the outputs (predictions) of the neural network and used to update parameters (weights) of the neural network via backpropagation. The neural network is also trained via pixel-level contrastive learning using only image pairs. Pixel-level contrastive learning receives an image pair, where each image includes an object in a particular category.
-
公开(公告)号:US11367268B2
公开(公告)日:2022-06-21
申请号:US16998890
申请日:2020-08-20
Applicant: NVIDIA Corporation
Inventor: Xiaodong Yang , Yang Zou , Zhiding Yu , Jan Kautz
Abstract: Object re-identification refers to a process by which images that contain an object of interest are retrieved from a set of images captured using disparate cameras or in disparate environments. Object re-identification has many useful applications, particularly as it is applied to people (e.g. person tracking). Current re-identification processes rely on convolutional neural networks (CNNs) that learn re-identification for a particular object class from labeled training data specific to a certain domain (e.g. environment), but that do not apply well in other domains. The present disclosure provides cross-domain disentanglement of id-related and id-unrelated factors. In particular, the disentanglement is performed using a labeled image set and an unlabeled image set, respectively captured from different domains but for a same object class. The identification-related features may then be used to train a neural network to perform re-identification of objects in that object class from images captured from the second domain.
-
-
-
-
-
-
-
-
-