-
公开(公告)号:US20250029489A1
公开(公告)日:2025-01-23
申请号:US18379599
申请日:2023-10-12
Applicant: NVIDIA Corporation
Inventor: Yulong Cao , Chaowei Xiao , Marco Pavone , Boris Ivanovic
IPC: G08G1/0967 , G06N3/02 , G08G1/01
Abstract: In various examples, a traffic model including one or more traffic scenarios may be generated and/or updated based on using human feedback. Human feedback may be provided indicating a preference for various traffic scenarios to identify which scenarios in a model are more realistic. A reward model may capture the preference information and rank the realism of one or more traffic scenarios.
-
公开(公告)号:US20240265690A1
公开(公告)日:2024-08-08
申请号:US18544840
申请日:2023-12-19
Applicant: NVIDIA Corporation
Inventor: Animashree Anandkumar , Linxi Fan , Zhiding Yu , Chaowei Xiao , Shikun Liu
CPC classification number: G06V10/82 , G06V10/811
Abstract: A vision-language model learns skills and domain knowledge via distinct and separate task-specific neural networks, referred to as experts. Each expert is independently optimized for a specific task, facilitating the use of domain-specific data and architectures that are not feasible with a single large neural network trained for multiple tasks. The vision-language model implemented as an ensemble of pre-trained experts and is more efficiently trained compared with the single large neural network. During training, the vision-language model integrates specialized skills and domain knowledge, rather than trying to simultaneously learn multiple tasks, resulting in effective multi-modal learning.
-
公开(公告)号:US20240095534A1
公开(公告)日:2024-03-21
申请号:US18243348
申请日:2023-09-07
Applicant: NVIDIA Corporation
Inventor: Anima Anandkumar , Chaowei Xiao , Weili Nie , De-An Huang , Zhiding Yu , Manli Shu
Abstract: Apparatuses, systems, and techniques to perform neural networks. In at least one embodiment, a most consistent output of one or more pre-trained neural networks is to be selected. In at least one embodiment, a most consistent output of one or more pre-trained neural networks is to be selected based, at least in part, on a plurality of variances of one or more inputs to the one or more neural networks.
-
公开(公告)号:US20240017745A1
公开(公告)日:2024-01-18
申请号:US17865344
申请日:2022-07-14
Applicant: NVIDIA Corporation
Inventor: Yulong Cao , Chaowei Xiao , Danfei Xu , Anima Anandkumar , Marco Pavone
CPC classification number: B60W60/0027 , B60W40/04 , B60W50/0097 , B60W2554/4041 , B60W2554/4044
Abstract: Apparatuses, systems, and techniques to generate trajectory data for moving objects. In at least one embodiment, adversarial trajectories are generated to evaluate a trajectory prediction model and are based, at least in part, on a differentiable dynamic model.
-
公开(公告)号:US20240095447A1
公开(公告)日:2024-03-21
申请号:US17846866
申请日:2022-06-22
Applicant: Nvidia Corporation
Inventor: Wei Ping , Boxin Wang , Chaowei Xiao , Mohammad Shoeybi , Mostofa Patwary , Anima Anandkumar , Bryan Catanzaro
IPC: G06F40/279 , G06F40/205 , G06F40/55
CPC classification number: G06F40/279 , G06F40/205 , G06F40/55
Abstract: Apparatuses, systems, and techniques are presented to identify and prevent generation of restricted content. In at least one embodiment, one or more neural networks are used to identify restricted content based only on the restricted content.
-
公开(公告)号:US20240062534A1
公开(公告)日:2024-02-22
申请号:US17893038
申请日:2022-08-22
Applicant: NVIDIA Corporation
Inventor: Xiaojian Ma , Weili Nie , Zhiding Yu , Huaizu Jiang , Chaowei Xiao , Yuke Zhu , Anima Anandkumar
CPC classification number: G06V10/82 , G06V10/255 , G06V10/94
Abstract: A vision transformer (ViT) is a deep learning model that performs one or more vision processing tasks. ViTs may be modified to include a global task that clusters images with the same concept together to produce semantically consistent relational representations, as well as a local task that guides the ViT to discover object-centric semantic correspondence across images. A database of concepts and associated features may be created and used to train the global and local tasks, which may then enable the ViT to perform visual relational reasoning faster, without supervision, and outside of a synthetic domain.
-
公开(公告)号:US20240029836A1
公开(公告)日:2024-01-25
申请号:US18353773
申请日:2023-07-17
Applicant: NVIDIA Corporation
Inventor: Weili Nie , Zichao Wang , Chaowei Xiao , Animashree Anandkumar
Abstract: A machine learning framework is described for performing generation of candidate molecules for, e.g., drug discovery or other applications. The framework utilizes a pre-trained encoder-decoder model to interface between representations of molecules and embeddings for those molecules in a latent space. A fusion module is located between the encoder and decoder and is used to fuse an embedding for an input molecule with embeddings for one or more exemplary molecules selected from a database that is constructed according to a design criteria. The fused embedding is decoded using the decoder to generate a candidate molecule. The fusion module is trained to reconstruct a nearest neighbor to the input molecule from the database based on the sample of exemplary molecules. An iterative approach may be used during inference to dynamically update the database to include newly generated candidate molecules.
-
公开(公告)号:US20250103968A1
公开(公告)日:2025-03-27
申请号:US18821611
申请日:2024-08-30
Applicant: NVIDIA Corporation
Inventor: Zizheng Pan , De-An Huang , Weili Nie , Zhiding Yu , Chaowei Xiao , Anima Anandkumar
IPC: G06N20/20
Abstract: Diffusion models are machine learning algorithms that are uniquely trained to generate high-quality data from an input lower-quality data. Diffusion probabilistic models use discrete-time random processes or continuous-time stochastic differential equations (SDEs) that learn to gradually remove the noise added to the data points. With diffusion probabilistic models, high quality output currently requires sampling from a large diffusion probabilistic model which corners at a high computational cost. The present disclosure stitches together the trajectory of two or more inferior diffusion probabilistic models during a denoising process, which can in turn accelerate the denoising process by avoiding use of only a single large diffusion probabilistic model.
-
公开(公告)号:US20240087222A1
公开(公告)日:2024-03-14
申请号:US18515016
申请日:2023-11-20
Applicant: NVIDIA Corporation
Inventor: Yiming Li , Zhiding Yu , Christopher B. Choy , Chaowei Xiao , Jose Manuel Alvarez Lopez , Sanja Fidler , Animashree Anandkumar
Abstract: An artificial intelligence framework is described that incorporates a number of neural networks and a number of transformers for converting a two-dimensional image into three-dimensional semantic information. Neural networks convert one or more images into a set of image feature maps, depth information associated with the one or more images, and query proposals based on the depth information. A first transformer implements a cross-attention mechanism to process the set of image feature maps in accordance with the query proposals. The output of the first transformer is combined with a mask token to generate initial voxel features of the scene. A second transformer implements a self-attention mechanism to convert the initial voxel features into refined voxel features, which are up-sampled and processed by a lightweight neural network to generate the three-dimensional semantic information, which may be used by, e.g., an autonomous vehicle for various advanced driver assistance system (ADAS) functions.
-
公开(公告)号:US20230290135A1
公开(公告)日:2023-09-14
申请号:US18119770
申请日:2023-03-09
Applicant: NVIDIA Corporation
Inventor: Daquan Zhou , Zhiding Yu , Enze Xie , Anima Anandkumar , Chaowei Xiao , Jose Manuel Alvarez Lopez
IPC: G06V10/82 , G06V10/77 , G06V10/778 , G06V10/30
CPC classification number: G06V10/82 , G06V10/7715 , G06V10/778 , G06V10/30
Abstract: Apparatuses, systems, and techniques to generate a robust representation of an image. In at least one embodiment, input tokens of an input image are received, and an inference about the input image is generated based on a vision transformer (ViT) system comprising at least one self-attention module to perform token mixing and a channel self-attention module to perform channel processing.
-
-
-
-
-
-
-
-
-