-
公开(公告)号:US20240265690A1
公开(公告)日:2024-08-08
申请号:US18544840
申请日:2023-12-19
Applicant: NVIDIA Corporation
Inventor: Animashree Anandkumar , Linxi Fan , Zhiding Yu , Chaowei Xiao , Shikun Liu
CPC classification number: G06V10/82 , G06V10/811
Abstract: A vision-language model learns skills and domain knowledge via distinct and separate task-specific neural networks, referred to as experts. Each expert is independently optimized for a specific task, facilitating the use of domain-specific data and architectures that are not feasible with a single large neural network trained for multiple tasks. The vision-language model implemented as an ensemble of pre-trained experts and is more efficiently trained compared with the single large neural network. During training, the vision-language model integrates specialized skills and domain knowledge, rather than trying to simultaneously learn multiple tasks, resulting in effective multi-modal learning.
-
公开(公告)号:US20250073901A1
公开(公告)日:2025-03-06
申请号:US18239601
申请日:2023-08-29
Applicant: NVIDIA Corporation
Inventor: Ajay Uday Mandlekar , Soroush Nasiriany , Bowen Wen , Iretiayo Akinola , Yashraj Shyam Narang , Linxi Fan , Yuke Zhu , Dieter Fox
Abstract: Apparatuses, systems, and techniques to generate data to train a robotic device to perform tasks. In at least one embodiment, one or more first videos of a robotic device performing a task is used to generate one or more second videos of the robotic device performing the task differently than depicted in the one or more first videos.
-