-
1.
公开(公告)号:US20220327809A1
公开(公告)日:2022-10-13
申请号:US17809133
申请日:2022-06-27
Inventor: Wei Li , Can Gao , Guocheng Niu , Xinyan Xiao , Hao Liu , Jiachen Liu , Hua Wu , Haifeng Wang
IPC: G06V10/778 , G06V10/774 , G06V10/26 , G06F40/284
Abstract: A method for training a model based on multi-modal data joint learning, includes: obtaining multi-modal data; in which the multi-modal data include at least one type of single-modal data and at least one type of Pair multi-modal data; inputting the single-modal data and the Pair multi-modal data into a decoupling attention Transformer network model to generate respectively Token semantic representation features and cross-modal semantic representation features; and training the decoupling attention Transformer network model based on the Token semantic representation features and the cross-modal semantic representation features.
-
公开(公告)号:US12277401B2
公开(公告)日:2025-04-15
申请号:US17502108
申请日:2021-10-15
Inventor: Guocheng Niu , Wei Li , Can Gao , Xinyan Xiao , Hua Wu
IPC: G06F18/25 , G06F40/205 , G06F40/47 , G06F40/58 , G06N3/02
Abstract: The present disclosure discloses a method and apparatus for acquiring a pre-trained model, and relates to natural language processing and deep learning technologies in the field of artificial intelligence technologies. An implementation includes: acquiring training data, the training data including a single-modal language material and a multi-modal language material, and the multi-modal language material including a language material pair formed by a first-modal language material and a second-modal language material; and performing a multi-task training operation on a pre-trained model using the training data, the multi-task including at least one cross-modal contrastive learning task and at least one single-modal learning task; the pre-trained language model obtained in the present disclosure may learn from different forms of language materials, i.e., the single-modal language material and the multi-modal language material, such that the pre-trained language model may effectively process information in various modals.
-