-
公开(公告)号:US20240127075A1
公开(公告)日:2024-04-18
申请号:US18212629
申请日:2023-06-21
Applicant: NVIDIA Corporation
Inventor: Shalini De Mello , Christian Jacobsen , Xunlei Wu , Stephen Tyree , Alice Li , Wonmin Byeon , Shangru Li
IPC: G06N3/0985
CPC classification number: G06N3/0985
Abstract: Machine learning is a process that learns a model from a given dataset, where the model can then be used to make a prediction about new data. In order to reduce the costs associated with collecting and labeling real world datasets for use in training the model, computer processes can synthetically generate datasets which simulate real world data. The present disclosure improves the effectiveness of such synthetic datasets for training machine learning models used in real world applications, in particular by generating a synthetic dataset that is specifically targeted to a specified downstream task (e.g. a particular computer vision task, a particular natural language processing task, etc.).
-
公开(公告)号:US20230177810A1
公开(公告)日:2023-06-08
申请号:US17853631
申请日:2022-06-29
Applicant: NVIDIA Corporation
Inventor: Jiarui Xu , Shalini De Mello , Sifei Liu , Wonmin Byeon , Thomas Breuel , Jan Kautz
IPC: G06V10/774 , G06V10/26
CPC classification number: G06V10/774 , G06V10/26
Abstract: Semantic segmentation includes the task of providing pixel-wise annotations for a provided image. To train a machine learning environment to perform semantic segmentation, image/caption pairs are retrieved from one or more databases. These image/caption pairs each include an image and associated textual caption. The image portion of each image/caption pair is passed to an image encoder of the machine learning environment that outputs potential pixel groupings (e.g., potential segments of pixels) within each image, while nouns are extracted from the caption portion and are converted to text prompts which are then passed to a text encoder that outputs a corresponding text representation. Contrastive loss operations are then performed on features extracted from these pixel groupings and text representations to determine an extracted feature for each noun of each caption that most closely matches the extracted features for the associated image.
-
公开(公告)号:US20230088912A1
公开(公告)日:2023-03-23
申请号:US17952866
申请日:2022-09-26
Applicant: NVIDIA Corporation
Inventor: Ruben Villegas , Alejandro Troccoli , Iuri Frosio , Stephen Tyree , Wonmin Byeon , Jan Kautz
Abstract: In various examples, historical trajectory information of objects in an environment may be tracked by an ego-vehicle and encoded into a state feature. The encoded state features for each of the objects observed by the ego-vehicle may be used—e.g., by a bi-directional long short-term memory (LSTM) network—to encode a spatial feature. The encoded spatial feature and the encoded state feature for an object may be used to predict lateral and/or longitudinal maneuvers for the object, and the combination of this information may be used to determine future locations of the object. The future locations may be used by the ego-vehicle to determine a path through the environment, or may be used by a simulation system to control virtual objects—according to trajectories determined from the future locations—through a simulation environment.
-
公开(公告)号:US11062471B1
公开(公告)日:2021-07-13
申请号:US16868342
申请日:2020-05-06
Applicant: NVIDIA Corporation
Inventor: Yiran Zhong , Wonmin Byeon , Charles Loop , Stanley Thomas Birchfield
Abstract: Stereo matching generates a disparity map indicating pixels offsets between matched points in a stereo image pair. A neural network may be used to generate disparity maps in real time by matching image features in stereo images using only 2D convolutions. The proposed method is faster than 3D convolution-based methods, with only a slight accuracy loss and higher generalization capability. A 3D efficient cost aggregation volume is generated by combining cost maps for each disparity level. Different disparity levels correspond to different amounts of shift between pixels in the left and right image pair. In general, each disparity level is inversely proportional to a different distance from the viewpoint.
-
公开(公告)号:US20240153093A1
公开(公告)日:2024-05-09
申请号:US18310414
申请日:2023-05-01
Applicant: NVIDIA Corporation
Inventor: Jiarui Xu , Shalini De Mello , Sifei Liu , Arash Vahdat , Wonmin Byeon
CPC classification number: G06T7/10 , G06V10/40 , G06T2207/20081 , G06T2207/20084
Abstract: An open-vocabulary diffusion-based panoptic segmentation system is not limited to perform segmentation using only object categories seen during training, and instead can also successfully perform segmentation of object categories not seen during training and only seen during testing and inferencing. In contrast with conventional techniques, a text-conditioned diffusion (generative) model is used to perform the segmentation. The text-conditioned diffusion model is pre-trained to generate images from text captions, including computing internal representations that provide spatially well-differentiated object features. The internal representations computed within the diffusion model comprise object masks and a semantic visual representation of the object. The semantic visual representation may be extracted from the diffusion model and used in conjunction with a text representation of a category label to classify the object. Objects are classified by associating the text representations of category labels with the object masks and their semantic visual representations to produce panoptic segmentation data.
-
公开(公告)号:US20240013462A1
公开(公告)日:2024-01-11
申请号:US17859615
申请日:2022-07-07
Applicant: Nvidia Corporation
Inventor: Yeongho Seol , Simon Yuen , Dmitry Aleksandrovich Korobchenko , Mingquan Zhou , Ronan Browne , Wonmin Byeon
CPC classification number: G06T13/205 , G06T13/40 , G06T17/20 , G10L25/63 , G10L15/16
Abstract: A deep neural network can be trained to output motion or deformation information for a character that is representative of the character uttering speech contained in audio input, which is accurate for an emotional state of the character. The character can have different facial components or regions (e.g., head, skin, eyes, tongue) modeled separately, such that the network can output motion or deformation information for each of these different facial components. During training, the network can be provided with emotion and/or style vectors that indicate information to be used in generating realistic animation for input speech, as may relate to one or more emotions to be exhibited by the character, a relative weighting of those emotions, and any style or adjustments to be made to how the character expresses that emotional state. The network output can be provided to a renderer to generate audio-driven facial animation that is emotion-accurate.
-
公开(公告)号:US20220254029A1
公开(公告)日:2022-08-11
申请号:US17500338
申请日:2021-10-13
Applicant: NVIDIA Corporation
Inventor: Eugene Vorontsov , Wonmin Byeon , Shalini De Mello , Varun Jampani , Ming-Yu Liu , Pavlo Molchanov
Abstract: The neural network includes an encoder, a common decoder, and a residual decoder. The encoder encodes input images into a latent space. The latent space disentangles unique features from other common features. The common decoder decodes common features resident in the latent space to generate translated images which lack the unique features. The residual decoder decodes unique features resident in the latent space to generate image deltas corresponding to the unique features. The neural network combines the translated images with the image deltas to generate combined images that may include both common features and unique features. The combined images can be used to drive autoencoding. Once training is complete, the residual decoder can be modified to generate segmentation masks that indicate any regions of a given input image where a unique feature resides.
-
公开(公告)号:US11989642B2
公开(公告)日:2024-05-21
申请号:US17952866
申请日:2022-09-26
Applicant: NVIDIA Corporation
Inventor: Ruben Villegas , Alejandro Troccoli , Iuri Frosio , Stephen Tyree , Wonmin Byeon , Jan Kautz
Abstract: In various examples, historical trajectory information of objects in an environment may be tracked by an ego-vehicle and encoded into a state feature. The encoded state features for each of the objects observed by the ego-vehicle may be used—e.g., by a bi-directional long short-term memory (LSTM) network—to encode a spatial feature. The encoded spatial feature and the encoded state feature for an object may be used to predict lateral and/or longitudinal maneuvers for the object, and the combination of this information may be used to determine future locations of the object. The future locations may be used by the ego-vehicle to determine a path through the environment, or may be used by a simulation system to control virtual objects—according to trajectories determined from the future locations—through a simulation environment.
-
公开(公告)号:US20230146647A1
公开(公告)日:2023-05-11
申请号:US17520448
申请日:2021-11-05
Applicant: NVIDIA Corporation
Inventor: Wonmin Byeon , Shalini De Mello , Ankur Arjun Mali
IPC: G06N3/04
CPC classification number: G06N3/04
Abstract: Apparatuses, systems, and techniques to perform and facilitate preservation of neural coding network weights over time. In at least one embodiment, a convolutional neural coding network is trained using a set of tasks such that said convolutional neural coding network retains an ability to perform inferencing based on tasks from previous training.
-
公开(公告)号:US20230015989A1
公开(公告)日:2023-01-19
申请号:US17365877
申请日:2021-07-01
Applicant: Nvidia Corporation
Inventor: Zhiding Yu , Rui Huang , Wonmin Byeon , Sifei Liu , Guilin Liu , Thomas Breuel , Anima Anandkumar , Jan Kautz
Abstract: The disclosure provides a learning framework that unifies both semantic segmentation and semantic edge detection. A learnable recurrent message passing layer is disclosed where semantic edges are considered as explicitly learned gating signals to refine segmentation and improve dense prediction quality by finding compact structures for message paths. The disclosure includes a method for coupled segmentation and edge learning. In one example, the method includes: (1) receiving an input image, (2) generating, from the input image, a semantic feature map, an affinity map, and a semantic edge map from a single backbone network of a convolutional neural network (CNN), and (3) producing a refined semantic feature map by smoothing pixels of the semantic feature map using spatial propagation, and controlling the smoothing using both affinity values from the affinity map and edge values from the semantic edge map.
-
-
-
-
-
-
-
-
-