Patent search ap:("NVIDIA Corporation") AND inv:"Arash Vahdat" Page 1

1.

发明申请
SINGLE IMAGE TO REALISTIC 3D OBJECT GENERATION VIA SEMI-SUPERVISED 2D AND 3D JOINT TRAINING 有权

公开(公告)号：US20250111592A1

公开(公告)日：2025-04-03

申请号：US18892186

申请日：2024-09-20

Applicant: NVIDIA Corporation

Inventor： Dejia Xu , Morteza Mardani , Jiaming Song , Sifei Liu , Ye Yuan , Arash Vahdat

IPC: G06T15/20 , G06V10/774 , G06V10/776 , G06V10/82

Abstract: Virtual reality and augmented reality bring increasing demand for 3D content creation. In an effort to automate the generation of 3D content, artificial intelligence-based processes have been developed. However, these processes are limited in terms of the quality of their output because they typically involve a model trained on limited 3D data thereby resulting in a model that does not generalize well to unseen objects, or a model trained on 2D data thereby resulting in a model that suffers from poor geometry due to ignorance of 3D information. The present disclosure jointly uses both 2D and 3D data to train a machine learning model to be able to generate 3D content from a single 2D image.

2.

发明公开
PHYSICS-GUIDED MOTION DIFFUSION MODEL 审中-公开

公开(公告)号：US20240169636A1

公开(公告)日：2024-05-23

申请号：US18317378

申请日：2023-05-15

Applicant: NVIDIA Corporation

Inventor： Ye Yuan , Jiaming Song , Umar Iqbal , Arash Vahdat , Jan Kautz

IPC: G06T13/40 , G06T5/00 , G06T13/80

CPC classification number: G06T13/40 , G06T5/002 , G06T13/80 , G06T2207/20081 , G06T2207/20084

Abstract: Systems and methods are disclosed that improve performance of synthesized motion generated by a diffusion neural network model. A physics-guided motion diffusion model incorporates physical constraints into the diffusion process to model the complex dynamics induced by forces and contact. Specifically, a physics-based motion projection module uses motion imitation in a physics simulator to project the denoised motion of a diffusion step to a physically plausible motion. The projected motion is further used in the next diffusion iteration to guide the denoising diffusion process. The use of physical constraints in the physics-guided motion diffusion model iteratively pulls the motion toward a physically-plausible space, reducing artifacts such as floating, foot sliding, and ground penetration.

3.

发明授权
Diffusion-based generative modeling for synthetic data generation systems and applications 有权

公开(公告)号：US12299962B2

公开(公告)日：2025-05-13

申请号：US17959915

申请日：2022-10-04

Applicant: Nvidia Corporation

Inventor： Karsten Kreis , Tim Dockhorn , Arash Vahdat

IPC: G06V10/774 , G06N3/045 , G06N3/047 , G06T7/277

Abstract: Systems and methods described relate to the synthesis of content using generative models. In at least one embodiment, a score-based generative model can use a stochastic differential equation with critically-damped Langevin diffusion to learn to synthesize content. During a forward diffusion process, noise can be introduced into a set of auxiliary (e.g., “velocity”) values for an input image to learn a score function. This score function can be used with the stochastic differential equation during a reverse diffusion denoising process to remove noise from the image to generate a reconstructed version of the input image. A score matching objective for the critically-damped Langevin diffusion process can require only the conditional distribution learned from the velocity data. A stochastic differential equation based integrator can then allow for efficient sampling from these critically-damped Langevin diffusion models.

4.

发明公开
DIFFUSION-BASED OPEN-VOCABULARY SEGMENTATION 审中-公开

公开(公告)号：US20240153093A1

公开(公告)日：2024-05-09

申请号：US18310414

申请日：2023-05-01

Applicant: NVIDIA Corporation

Inventor： Jiarui Xu , Shalini De Mello , Sifei Liu , Arash Vahdat , Wonmin Byeon

IPC: G06T7/10 , G06V10/40

CPC classification number: G06T7/10 , G06V10/40 , G06T2207/20081 , G06T2207/20084

Abstract: An open-vocabulary diffusion-based panoptic segmentation system is not limited to perform segmentation using only object categories seen during training, and instead can also successfully perform segmentation of object categories not seen during training and only seen during testing and inferencing. In contrast with conventional techniques, a text-conditioned diffusion (generative) model is used to perform the segmentation. The text-conditioned diffusion model is pre-trained to generate images from text captions, including computing internal representations that provide spatially well-differentiated object features. The internal representations computed within the diffusion model comprise object masks and a semantic visual representation of the object. The semantic visual representation may be extracted from the diffusion model and used in conjunction with a text representation of a category label to classify the object. Objects are classified by associating the text representations of category labels with the object masks and their semantic visual representations to produce panoptic segmentation data.

5.

发明公开
LANDMARK DETECTION WITH AN ITERATIVE NEURAL NETWORK 审中-公开

公开(公告)号：US20240096115A1

公开(公告)日：2024-03-21

申请号：US18243555

申请日：2023-09-07

Applicant: NVIDIA Corporation

Inventor： Pavlo Molchanov , Jan Kautz , Arash Vahdat , Hongxu Yin , Paul Micaelli

IPC: G06V20/59 , G06T7/70 , G06V10/82 , G06V20/70 , G06V40/16

CPC classification number: G06V20/597 , G06T7/70 , G06V10/82 , G06V20/70 , G06V40/171 , G06T2207/30201 , G06V2201/07

Abstract: Landmark detection refers to the detection of landmarks within an image or a video, and is used in many computer vision tasks such emotion recognition, face identity verification, hand tracking, gesture recognition, and eye gaze tracking. Current landmark detection methods rely on a cascaded computation through cascaded networks or an ensemble of multiple models, which starts with an initial guess of the landmarks and iteratively produces corrected landmarks which match the input more finely. However, the iterations required by current methods typically increase the training memory cost linearly, and do not have an obvious stopping criteria. Moreover, these methods tend to exhibit jitter in landmark detection results for video. The present disclosure improves current landmark detection methods by providing landmark detection using an iterative neural network. Furthermore, when detecting landmarks in video, the present disclosure provides for a reduction in jitter due to reuse of previous hidden states from previous frames.

6.

发明申请
TECHNIQUES TO IDENTIFY DATA USED TO TRAIN ONE OR MORE NEURAL NETWORKS 有权

公开(公告)号：US20220284232A1

公开(公告)日：2022-09-08

申请号：US17188397

申请日：2021-03-01

Applicant: NVIDIA Corporation

Inventor： Hongxu Yin , Arun Mallya , Arash Vahdat , Jose Manuel Alvarez Lopez , Jan Kautz , Pavlo Molchanov

IPC: G06K9/62 , G06K9/66

Abstract: Apparatuses, systems, and techniques to identify one or more images used to train one or more neural networks. In at least one embodiment, one or more images used to train one or more neural networks are identified, based on, for example, one or more labels of one or more objects within the one or more images.

7.

发明申请
JOINT REPRESENTATION LEARNING FROM IMAGES AND TEXT 有权

公开(公告)号：US20210056353A1

公开(公告)日：2021-02-25

申请号：US17000048

申请日：2020-08-21

Applicant: Nvidia Corporation

Inventor： Arash Vahdat , Tanmay Gupta , Xiaodong Yang , Jan Kautz

IPC: G06K9/62 , G06N3/08

Abstract: The disclosure provides a framework or system for learning visual representation using a large set of image/text pairs. The disclosure provides, for example, a method of visual representation learning, a joint representation learning system, and an artificial intelligence (AI) system that employs one or more of the trained models from the method or system. The AI system can be used, for example, in autonomous or semi-autonomous vehicles. In one example, the method of visual representation learning includes: (1) receiving a set of image embeddings from an image representation model and a set of text embeddings from a text representation model, and (2) training, employing mutual information, a critic function by learning relationships between the set of image embeddings and the set of text embeddings.

8.

发明申请
TECHNIQUES FOR IDENTIFICATION OF OUT-OF-DISTRIBUTION INPUT DATA IN NEURAL NETWORKS 有权

公开(公告)号：US20250118083A1

公开(公告)日：2025-04-10

申请号：US18656257

申请日：2024-05-06

Applicant: NVIDIA Corporation

Inventor： Sina Mohseni , Arash Vahdat , Jay Yadawa

IPC: G06V20/56 , G06F18/211 , G06F18/2415 , G06F18/2431 , G06N3/045 , G06N3/08

Abstract: Apparatuses, systems, and techniques to identify out-of-distribution input data in one or more neural networks. In at least one embodiment, a technique includes training one or more neural networks to infer a plurality of characteristics about input information based, at least in part, on the one or more neural networks being independently trained to infer each of the plurality of characteristics about the input information.

9.

发明申请
VARIATIONAL INFERENCING BY A DIFFUSION MODEL 有权

公开(公告)号：US20250045892A1

公开(公告)日：2025-02-06

申请号：US18593742

申请日：2024-03-01

Applicant: NVIDIA Corporation

Inventor： Morteza Mardani , Jiaming Song , Jan Kautz , Arash Vahdat

IPC: G06T7/00 , G06T3/40 , G06T5/70 , G06T5/73 , G06T5/77

Abstract: Diffusion models are machine learning algorithms that are uniquely trained to generate high-quality data from an input lower-quality data. For example, they can be trained in the image domain, for example, to perform specific image restoration tasks, such as inpainting (e.g. completing an incomplete image), deblurring (e.g. removing blurring from an image), and super-resolution (e.g. increasing a resolution of an image), or they can be trained to perform image rendering tasks, including 2D-to-3D image generation tasks. However, current approaches to training diffusion models only allow the models to be optimized for a specific task such that they will not achieve high-quality results when used for other tasks. The present disclosure provides a diffusion model that uses variational inferencing to approximate a distribution of data, which allows the diffusion model to universally solve different tasks without having to be re-trained specifically for each task.

10.

发明公开
FAIRNESS-BASED NEURAL NETWORK MODEL TRAINING USING REAL AND GENERATED DATA 审中-公开

公开(公告)号：US20240144000A1

公开(公告)日：2024-05-02

申请号：US18307227

申请日：2023-04-26

Applicant: NVIDIA Corporation

Inventor： Yuji Roh , Weili Nie , De-An Huang , Arash Vahdat , Animashree Anandkumar

IPC: G06N3/08

CPC classification number: G06N3/08

Abstract: A neural network model is trained for fairness and accuracy using both real and synthesized training data, such as images. During training a first sampling ratio between the real and synthesized training data is optimized. The first sampling ratio may comprise a value for each group (or attribute), where each value is optimized. A second sampling ratio defines relative amounts of training data that are used for each one of the groups. Furthermore, a neural network model accuracy and a fairness metric are both used for updating the first and second sampling ratios during training iterations. The neural network model may be trained using different classes of training data. The second sampling ratio may vary for each class.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification