-
公开(公告)号:US20230162490A1
公开(公告)日:2023-05-25
申请号:US17589725
申请日:2022-01-31
Applicant: salesforce.com, inc.
Inventor: Shu Zhang , Junnan Li , Ran Xu , Caiming Xiong , Chetan Ramaiah
IPC: G06V10/776 , G06V10/74 , G06F40/284 , G06F40/166 , G06F40/126 , G06V10/80 , G06F16/583 , G06F16/56
CPC classification number: G06V10/776 , G06V10/761 , G06F40/284 , G06F40/166 , G06F40/126 , G06V10/806 , G06F16/5846 , G06F16/56
Abstract: Embodiments described herein a CROss-Modal Distribution Alignment (CROMDA) model for vision-language pretraining, which can be used for retrieval downstream tasks. In the CROMDA mode, global cross-modal representations are aligned on each unimodality. Specifically, a uni-modal global similarity between an image/text and the image/text feature queue are computed. A softmax-normalized distribution is then generated based on the computed similarity. The distribution thus takes advantage of property of the global structure of the queue. CROMDA then aligns the two distributions and learns a modal invariant global representation. In this way, CROMDA is able to obtain invariant property in each modality, where images with similar text representations should be similar and vice versa.
-
公开(公告)号:US11495011B2
公开(公告)日:2022-11-08
申请号:US16988536
申请日:2020-08-07
Applicant: salesforce.com, inc.
Inventor: Shu Zhang , Chetan Ramaiah , Ran Xu , Caiming Xiong
Abstract: The system has a form analysis module that receives an image of a form into which values have been filled for the possible fields of information on the form, such as first name, address, age, and the like. By using a library of form templates, a form analysis module allows both flexibility of form processing and simplicity for the user. That is, the techniques used by the form analysis module allow the processing of any form image for which the library has a form template example. The form image need not precisely match any form template, but rather may be scaled or shifted relative to a corresponding template. Additionally, the user need only provide the form image itself, without providing any additional exemplars, metadata for training, or the like.
-
公开(公告)号:US20220300761A1
公开(公告)日:2022-09-22
申请号:US17328779
申请日:2021-05-24
Applicant: salesforce.com, inc.
Inventor: Shu Zhang , Chetan Ramaiah , Caiming Xiong , Ran Xu
Abstract: Embodiments described herein provide a hierarchical multi-label framework to learn an embedding function that may capture the hierarchical relationship between classes at different levels in the hierarchy. Specifically, supervised contrastive learning framework may be extended to the hierarchical multi-label setting. Each data point has multiple dependent labels, and the relationship between labels is represented as a hierarchy of labels. The relationship between the different levels of labels may then be learnt by a contrastive learning framework.
-
-