VISION-LANGUAGE MODEL WITH AN ENSEMBLE OF EXPERTS

    公开(公告)号:US20240265690A1

    公开(公告)日:2024-08-08

    申请号:US18544840

    申请日:2023-12-19

    CPC classification number: G06V10/82 G06V10/811

    Abstract: A vision-language model learns skills and domain knowledge via distinct and separate task-specific neural networks, referred to as experts. Each expert is independently optimized for a specific task, facilitating the use of domain-specific data and architectures that are not feasible with a single large neural network trained for multiple tasks. The vision-language model implemented as an ensemble of pre-trained experts and is more efficiently trained compared with the single large neural network. During training, the vision-language model integrates specialized skills and domain knowledge, rather than trying to simultaneously learn multiple tasks, resulting in effective multi-modal learning.

Patent Agency Ranking