-
公开(公告)号:US20240312219A1
公开(公告)日:2024-09-19
申请号:US18185074
申请日:2023-03-16
Applicant: NVIDIA Corporation
Inventor: Jiwoong Choi , Jose Manuel Alvarez Lopez , Shiyi Lan , Yashar Asgarieh , Zhiding Yu
CPC classification number: G06V20/58 , B60W60/001 , B60W2420/403
Abstract: In various examples, temporal-based perception for autonomous or semi-autonomous systems and applications is described. Systems and methods are disclosed that use a machine learning model (MLM) to intrinsically fuse feature maps associated with different sensors and different instances in time. To generate a feature map, image data generated using image sensors (e.g., cameras) located around a vehicle are processed using a MLM that is trained to generate the feature map. The MLM may then fuse the feature maps in order to generate a final feature map associated with a current instance in time. The feature maps associated with the previous instances in time may be preprocessed using one or more layers of the MLM, where the one or more layers are associated with performing temporal transformation before the fusion is performed. The MLM may then use the final feature map to generate one or more outputs.
-
12.
公开(公告)号:US20240104842A1
公开(公告)日:2024-03-28
申请号:US18472653
申请日:2023-09-22
Applicant: NVIDIA Corporation
Inventor: Koki Nagano , Alexander Trevithick , Chao Liu , Eric Ryan Chan , Sameh Khamis , Michael Stengel , Zhiding Yu
IPC: G06T17/00 , G06T5/20 , G06T7/70 , G06T7/90 , G06V10/771
CPC classification number: G06T17/00 , G06T5/20 , G06T7/70 , G06T7/90 , G06V10/771 , G06T2207/10024
Abstract: A method for generating, by an encoder-based model, a three-dimensional (3D) representation of a two-dimensional (2D) image is provided. The encoder-based model is trained to infer the 3D representation using a synthetic training data set generated by a pre-trained model. The pre-trained model is a 3D generative model that produces a 3D representation and a corresponding 2D rendering, which can be used to train a separate encoder-based model for downstream tasks like estimating a triplane representation, neural radiance field, mesh, depth map, 3D key points, or the like, given a single input image, using the pseudo ground truth 3D synthetic training data set. In a particular embodiment, the encoder-based model is trained to predict a triplane representation of the input image, which can then be rendered by a volume renderer according to pose information to generate an output image of the 3D scene from the corresponding viewpoint.
-
公开(公告)号:US20240062534A1
公开(公告)日:2024-02-22
申请号:US17893038
申请日:2022-08-22
Applicant: NVIDIA Corporation
Inventor: Xiaojian Ma , Weili Nie , Zhiding Yu , Huaizu Jiang , Chaowei Xiao , Yuke Zhu , Anima Anandkumar
CPC classification number: G06V10/82 , G06V10/255 , G06V10/94
Abstract: A vision transformer (ViT) is a deep learning model that performs one or more vision processing tasks. ViTs may be modified to include a global task that clusters images with the same concept together to produce semantically consistent relational representations, as well as a local task that guides the ViT to discover object-centric semantic correspondence across images. A database of concepts and associated features may be created and used to train the global and local tasks, which may then enable the ViT to perform visual relational reasoning faster, without supervision, and outside of a synthetic domain.
-
公开(公告)号:US20230015989A1
公开(公告)日:2023-01-19
申请号:US17365877
申请日:2021-07-01
Applicant: Nvidia Corporation
Inventor: Zhiding Yu , Rui Huang , Wonmin Byeon , Sifei Liu , Guilin Liu , Thomas Breuel , Anima Anandkumar , Jan Kautz
Abstract: The disclosure provides a learning framework that unifies both semantic segmentation and semantic edge detection. A learnable recurrent message passing layer is disclosed where semantic edges are considered as explicitly learned gating signals to refine segmentation and improve dense prediction quality by finding compact structures for message paths. The disclosure includes a method for coupled segmentation and edge learning. In one example, the method includes: (1) receiving an input image, (2) generating, from the input image, a semantic feature map, an affinity map, and a semantic edge map from a single backbone network of a convolutional neural network (CNN), and (3) producing a refined semantic feature map by smoothing pixels of the semantic feature map using spatial propagation, and controlling the smoothing using both affinity values from the affinity map and edge values from the semantic edge map.
-
公开(公告)号:US20210334644A1
公开(公告)日:2021-10-28
申请号:US16859360
申请日:2020-04-27
Applicant: NVIDIA Corporation
Inventor: Zhiding Yu , Wuyang Chen , Anima Anandkumar
Abstract: Apparatuses, systems, and techniques to train one or more neural networks. In at least one embodiment, one or more neural networks are trained based, at least in part, on inferencing output from one or more second neural networks.
-
公开(公告)号:US20210279841A1
公开(公告)日:2021-09-09
申请号:US16813589
申请日:2020-03-09
Applicant: NVIDIA Corporation
Inventor: Guilin Liu , Andrew Tao , Bryan Christopher Catanzaro , Ting-Chun Wang , Zhiding Yu , Shiqiu Liu , Fitsum Reda , Karan Sapra , Brandon Rowlett
Abstract: Apparatuses, systems, and techniques for texture synthesis from small input textures in images using convolutional neural networks. In at least one embodiment, one or more convolutional layers are used in conjunction with one or more transposed convolution operations to generate a large textured output image from a small input textured image while preserving global features and texture, according to various novel techniques described herein.
-
公开(公告)号:US20200302176A1
公开(公告)日:2020-09-24
申请号:US16357047
申请日:2019-03-18
Applicant: NVIDIA Corporation
Inventor: Xiaodong Yang , Zhedong Zheng , Zhiding Yu
Abstract: A neural network is trained to perform a re-identification task in which it is determined whether one or more features present in a first image appear also in a second image. During training, a generative portion of one or more neural networks generates variations of an input image, and a discriminative portion of the one or more neural networks learns to perform the re-identification task based at least in part on the variations of the image. During training, the generative and discriminative portions of the one or more neural networks share an encoder which encodes information used by the generative and discriminative portions.
-
公开(公告)号:US20250029409A1
公开(公告)日:2025-01-23
申请号:US18354431
申请日:2023-07-18
Applicant: Nvidia Corporation
Inventor: Subhashree Radhakrishnan , Ramanathan Arunachahalam , Farzin Aghdasi , Zhiding Yu , Shiyi Lan
Abstract: Approaches are disclosed herein for an automatic segmentation labeling system that identifies objects for potential open-class categories and generates segmentation masks for objects. The disclosed system may use a training pipeline that trains two segmentation models. The training pipeline may take, as input, a set of images with bounding boxes and class labels. The set of images may be fed into a first segmentation network with the bounding boxes used as ground truth for weak supervision. The first segmentation network may be trained to generate pseudo segmentation masks. In a second stage, the trained first segmentation network is used to generate pseudo masks for a set of input images. The generated pseudo masks are provided as input, along with the corresponding images, to a second segmentation network to be used as a type of ground truth data for training the second segmentation network to generate high-quality segmentation masks.
-
公开(公告)号:US20250020481A1
公开(公告)日:2025-01-16
申请号:US18681836
申请日:2022-04-07
Applicant: NVIDIA Corporation
Inventor: Enze Xie , Zhiding Yu , Jonah Philion , Anima Anandkumar , Sanja Fidler , Jose Manuel Alvarez Lopez
Abstract: Apparatuses, systems, and techniques are presented to determination about objects in an environment. In at least one embodiment, a neural network can be used to determine one or more positions of one or more objects within a three-dimensional (3D) environment and to generate a segmented map of the 3D environment based, at least in part, on one or more two dimensional (2D) images of the one or more objects.
-
公开(公告)号:US20240416963A1
公开(公告)日:2024-12-19
申请号:US18379601
申请日:2023-10-12
Applicant: NVIDIA Corporation
Inventor: Zhiqi Li , Zhiding Yu , David Austin , Shiyi Lan , Jan Kautz , Jose Manuel Alvarez Lopez
Abstract: Apparatuses, systems, and techniques of using one or more machine learning processes (e.g., neural network(s)) to predict occupancy using an image input. In at least one embodiment, image data is processed using a neural network to predict occupancy in a 3D voxel space. In at least one embodiment, image data is processed using a neural network to detect objects in a 3D space.
-
-
-
-
-
-
-
-
-