Patent search ap:("Nvidia Corporation") AND inv:"Zhiding Yu" Page 3

21.

发明公开
LONG-RANGE 3D OBJECT DETECTION USING 2D BOUNDING BOXES 审中-公开

公开(公告)号：US20240249538A1

公开(公告)日：2024-07-25

申请号：US18223473

申请日：2023-07-18

Applicant: NVIDIA Corporation

Inventor： Zetong Yang , Zhiding Yu , Ren Hao Wang , Chris Choy , Anima Anandkumar , Jose M. Alvarez Lopez

IPC: G06V20/64 , G06T7/50 , G06T7/70 , G06T7/80 , G06V10/22 , G06V10/82

CPC classification number: G06V20/64 , G06T7/50 , G06T7/70 , G06T7/80 , G06V10/225 , G06V10/82 , G06T2207/20081 , G06T2207/20084 , G06T2207/30252 , G06V2201/07

Abstract: 3D object detection is a computer vision task that generally detects (e.g. classifies and localizes) objects in 3D space from the 2D images or videos that capture the objects. Current techniques used for 3D object detection rely on machine learning processes that learn to detect 3D objects from existing images annotated with high-quality 3D information including depth information generally obtained using lidar technology. However, due to lidar's limited measurable range, current machine learning solutions to 3D object detection do not support detection of 3D objects beyond the lidar range, which is needed for numerous applications, including autonomous driving applications where existing close or midrange 3D object detection does not always meet the safety-critical requirement of autonomous driving. The present disclosure provides for 3D object detection using a technique that supports long-range detection (i.e. detection beyond the lidar range).

22.

发明公开
PERFORMING VISUAL RELATIONAL REASONING 审中-公开

公开(公告)号：US20240078423A1

公开(公告)日：2024-03-07

申请号：US17893026

申请日：2022-08-22

Applicant: NVIDIA Corporation

Inventor： Xiaojian Ma , Weili Nie , Zhiding Yu , Huaizu Jiang , Chaowei Xiao , Yuke Zhu , Anima Anandkumar

IPC: G06N3/08 , G06F16/55 , G06N3/04

CPC classification number: G06N3/08 , G06F16/55 , G06N3/04

Abstract: A vision transformer (ViT) is a deep learning model that performs one or more vision processing tasks. ViTs may be modified to include a global task that clusters images with the same concept together to produce semantically consistent relational representations, as well as a local task that guides the ViT to discover object-centric semantic correspondence across images. A database of concepts and associated features may be created and used to train the global and local tasks, which may then enable the ViT to perform visual relational reasoning faster, without supervision, and outside of a synthetic domain.

23.

发明公开
VIDEO INSTANCE SEGMENTATION 审中-公开

公开(公告)号：US20240037756A1

公开(公告)日：2024-02-01

申请号：US18144071

申请日：2023-05-05

Applicant: NVIDIA Corporation

Inventor： De-An Huang , Zhiding Yu , Anima Anandkumar

IPC: G06T7/20 , G06T5/20 , G06T7/70 , G06V10/74 , G06V10/82

CPC classification number: G06T7/20 , G06T5/20 , G06T7/70 , G06V10/761 , G06V10/82 , G06V2201/07 , G06T2207/20081

Abstract: Apparatuses, systems, and techniques to track one or more objects in one or more frames of a video. In at least one embodiment, one or more objects in one or more frames of a video are tracked based on, for example, one or more sets of embeddings.

24.

发明授权
Image processing using coupled segmentation and edge learning 有权

公开(公告)号：US11790633B2

公开(公告)日：2023-10-17

申请号：US17365877

申请日：2021-07-01

Applicant: Nvidia Corporation

Inventor： Zhiding Yu , Rui Huang , Wonmin Byeon , Sifei Liu , Guilin Liu , Thomas Breuel , Anima Anandkumar , Jan Kautz

IPC: G06V10/50 , G06N3/04 , G06T7/13 , G06V10/75 , G06F18/2413

CPC classification number: G06V10/50 , G06F18/2413 , G06N3/04 , G06T7/13 , G06V10/758

Abstract: The disclosure provides a learning framework that unifies both semantic segmentation and semantic edge detection. A learnable recurrent message passing layer is disclosed where semantic edges are considered as explicitly learned gating signals to refine segmentation and improve dense prediction quality by finding compact structures for message paths. The disclosure includes a method for coupled segmentation and edge learning. In one example, the method includes: (1) receiving an input image, (2) generating, from the input image, a semantic feature map, an affinity map, and a semantic edge map from a single backbone network of a convolutional neural network (CNN), and (3) producing a refined semantic feature map by smoothing pixels of the semantic feature map using spatial propagation, and controlling the smoothing using both affinity values from the affinity map and edge values from the semantic edge map.

25.

发明申请
TRAJECTORY STITCHING FOR ACCELERATING DIFFUSION MODELS 有权

公开(公告)号：US20250103968A1

公开(公告)日：2025-03-27

申请号：US18821611

申请日：2024-08-30

Applicant: NVIDIA Corporation

Inventor： Zizheng Pan , De-An Huang , Weili Nie , Zhiding Yu , Chaowei Xiao , Anima Anandkumar

IPC: G06N20/20

Abstract: Diffusion models are machine learning algorithms that are uniquely trained to generate high-quality data from an input lower-quality data. Diffusion probabilistic models use discrete-time random processes or continuous-time stochastic differential equations (SDEs) that learn to gradually remove the noise added to the data points. With diffusion probabilistic models, high quality output currently requires sampling from a large diffusion probabilistic model which corners at a high computational cost. The present disclosure stitches together the trajectory of two or more inferior diffusion probabilistic models during a denoising process, which can in turn accelerate the denoising process by avoiding use of only a single large diffusion probabilistic model.

26.

发明申请
BI-DIRECTIONAL FEATURE PROJECTION FOR 3D PERCEPTION SYSTEMS AND APPLICATIONS 有权

公开(公告)号：US20240378799A1

公开(公告)日：2024-11-14

申请号：US18642531

申请日：2024-04-22

Applicant: NVIDIA Corporation

Inventor： Zhiqi Li , Zhiding Yu , Animashree Anandkumar , Jose Manuel Alvarez Lopez

IPC: G06T15/20 , G06T7/11 , G06T7/50 , G06V20/58

Abstract: In various examples, bi-directional projection techniques may be used to generate enhanced Bird's-Eye View (BEV) representations. For example, a system(s) may generate one or more BEV features associated with a BEV of an environment using a projection process that associates 2D image features to one or more first locations of a 3D space. At least partially using the BEV feature(s), the system(s) may determine one or more second locations of the 3D space that correspond to one or more regions of interest in the environment. The system(s) may then generate one or more additional BEV features corresponding to the second location(s) using a different projection process that associates the second location(s) from the 3D space to at least a portion of the 2D image features. The system(s) may then generate an updated BEV of the environment based at least on the BEV feature(s) and/or the additional BEV feature(s).

27.

发明公开
POINT-LEVEL SUPERVISION FOR VIDEO INSTANCE SEGMENTATION 审中-公开

公开(公告)号：US20240221166A1

公开(公告)日：2024-07-04

申请号：US18395198

申请日：2023-12-22

Applicant: NVIDIA Corporation

Inventor： Zhiding Yu , Shuaiyi Huang , De-An Huang , Shiyi Lan , Subhashree Radhakrishnan , Jose M. Alvarez Lopez , Anima Anandkumar

IPC: G06T7/12 , G06V10/764 , G06V20/70

CPC classification number: G06T7/12 , G06V10/764 , G06V20/70 , G06T2207/20081

Abstract: Video instance segmentation is a computer vision task that aims to detect, segment, and track objects continuously in videos. It can be used in numerous real-world applications, such as video editing, three-dimensional (3D) reconstruction, 3D navigation (e.g. for autonomous driving and/or robotics), and view point estimation. However, current machine learning-based processes employed for video instance segmentation are lacking, particularly because the densely annotated videos needed for supervised training of high-quality models are not readily available and are not easily generated. To address the issues in the prior art, the present disclosure provides point-level supervision for video instance segmentation in a manner that allows the resulting machine learning model to handle any object category.

28.

发明公开
CLASS AGNOSTIC OBJECT MASK GENERATION 审中-公开

公开(公告)号：US20240169545A1

公开(公告)日：2024-05-23

申请号：US18355856

申请日：2023-07-20

Applicant: NVIDIA Corporation

Inventor： Shiyi Lan , Zhiding Yu , Subhashree Radhakrishnan , Jose Manuel Alvarez Lopez , Animashree Anandkumar

IPC: G06T7/11 , G06T1/20

CPC classification number: G06T7/11 , G06T1/20 , G06T2207/20081 , G06T2207/20084 , G06T2207/20132

Abstract: Class agnostic object mask generation uses a vision transformer-based auto-labeling framework requiring only images and object bounding boxes to generate object (segmentation) masks. The generated object masks, images, and object labels may then be used to train instance segmentation models or other neural networks to localize and segment objects with pixel-level accuracy. The generated object masks may supplement or replace conventional human generated annotations. The human generated annotations may be misaligned compared with the object boundaries, resulting in poor quality labeled segmentation masks. In contrast with conventional techniques, the generated object masks are class agnostic and are automatically generated based only on a bounding box image region without relying on either labels or semantic information.

29.

发明公开
SPARSE VOXEL TRANSFORMER FOR CAMERA-BASED 3D SEMANTIC SCENE COMPLETION 审中-公开

公开(公告)号：US20240087222A1

公开(公告)日：2024-03-14

申请号：US18515016

申请日：2023-11-20

Applicant: NVIDIA Corporation

Inventor： Yiming Li , Zhiding Yu , Christopher B. Choy , Chaowei Xiao , Jose Manuel Alvarez Lopez , Sanja Fidler , Animashree Anandkumar

IPC: G06T17/00 , B60W50/14 , G06T3/40 , G06V10/44 , G06V10/771 , G06V10/82

CPC classification number: G06T17/00 , B60W50/14 , G06T3/40 , G06V10/44 , G06V10/771 , G06V10/82

Abstract: An artificial intelligence framework is described that incorporates a number of neural networks and a number of transformers for converting a two-dimensional image into three-dimensional semantic information. Neural networks convert one or more images into a set of image feature maps, depth information associated with the one or more images, and query proposals based on the depth information. A first transformer implements a cross-attention mechanism to process the set of image feature maps in accordance with the query proposals. The output of the first transformer is combined with a mask token to generate initial voxel features of the scene. A second transformer implements a self-attention mechanism to convert the initial voxel features into refined voxel features, which are up-sampled and processed by a lightweight neural network to generate the three-dimensional semantic information, which may be used by, e.g., an autonomous vehicle for various advanced driver assistance system (ADAS) functions.

30.

发明公开
ESTIMATING OPTIMAL TRAINING DATA SET SIZE FOR MACHINE LEARNING MODEL SYSTEMS AND APPLICATIONS 审中-公开

公开(公告)号：US20230385687A1

公开(公告)日：2023-11-30

申请号：US17828663

申请日：2022-05-31

Applicant: NVIDIA Corporation

Inventor： Rafid Reza Mahmood , James Robert Lucas , David Jesus Acuna Marrero , Daiqing Li , Jonah Philion , Jose Manuel Alvarez Lopez , Zhiding Yu , Sanja Fidler , Marc Law

IPC: G06N20/00 , G06K9/62

CPC classification number: G06N20/00 , G06K9/6265

Abstract: Approaches for training data set size estimation for machine learning model systems and applications are described. Examples include a machine learning model training system that estimates target data requirements for training a machine learning model, given an approximate relationship between training data set size and model performance using one or more validation score estimation functions. To derive a validation score estimation function, a regression data set is generated from training data, and subsets of the regression data set are used to train the machine learning model. A validation score is computed for the subsets and used to compute regression function parameters to curve fit the selected regression function to the training data set. The validation score estimation function is then solved for and provides an output of an estimate of the number additional training samples needed for the validation score estimation function to meet or exceed a target validation score.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification