Patent search ap:("salesforce.com Page inc.") AND inv:"Junnan Li"

11.

发明申请
UNSUPERVISED REPRESENTATION LEARNING WITH CONTRASTIVE PROTOTYPES 有权

公开(公告)号：US20210295091A1

公开(公告)日：2021-09-23

申请号：US16870621

申请日：2020-05-08

Applicant: salesforce.com, inc.

Inventor： Junnan Li , Chu Hong Hoi

IPC: G06K9/62 , G06T7/73

Abstract: The system and method are directed to a prototypical contrastive learning (PCL). The PCL explicitly encodes the hierarchical semantic structure of the dataset into the learned embedding space and prevents the network from exploiting low-level cues for solving the unsupervised learning task. The PCL includes prototypes as the latent variables to help find the maximum-likelihood estimation of the network parameters in an expectation-maximization framework. The PCL iteratively performs an E-step for finding prototypes with clustering and M-step for optimizing the network on a contrastive loss.

12.

发明授权
Systems and methods for video representation learning with a weak teacher 有权

公开(公告)号：US12210976B2

公开(公告)日：2025-01-28

申请号：US17219339

申请日：2021-03-31

Applicant: Salesforce.com, Inc.

Inventor： Hualin Liu , Chu Hong Hoi , Junnan Li

IPC: G06N3/084 , G06F18/214 , G06F18/22 , G06N3/088 , G06V10/75

Abstract: Embodiments described herein provide systems and methods for learning representation from unlabeled videos. Specifically, a method may comprise generating a set of strongly-augmented samples and a set of weakly-augmented samples from the unlabeled video samples; generating a set of predictive logits by inputting the set of strongly-augmented samples into a student model and a first teacher model; generating a set of artificial labels by inputting the set of weakly-augmented samples to a second teacher model that operates in parallel to the first teacher model, wherein the second teacher model shares one or more model parameters with the first teacher model; computing a loss objective based on the set of predictive logits and the set of artificial labels; updating student model parameters based on the loss objective via backpropagation; and updating the shared parameters for the first teacher model and the second teacher model based on the updated student model parameters.

13.

发明授权
Unsupervised representation learning with contrastive prototypes 有权

公开(公告)号：US11776236B2

公开(公告)日：2023-10-03

申请号：US17591121

申请日：2022-02-02

Applicant: salesforce.com, inc.

Inventor： Junnan Li , Chu Hong Hoi

IPC: G06K9/62 , G06V10/44 , G06T7/73 , G06F18/23 , G06F18/214 , G06V10/762 , G06V10/774 , G06V10/776 , G06V10/82

CPC classification number: G06V10/454 , G06F18/2155 , G06F18/23 , G06T7/73 , G06V10/763 , G06V10/776 , G06V10/7753 , G06V10/82 , G06T2207/20084

Abstract: The system and method are directed to a prototypical contrastive learning (PCL). The PCL explicitly encodes the hierarchical semantic structure of the dataset into the learned embedding space and prevents the network from exploiting low-level cues for solving the unsupervised learning task. The PCL includes prototypes as the latent variables to help find the maximum-likelihood estimation of the network parameters in an expectation-maximization framework. The PCL iteratively performs an E-step for finding prototypes with clustering and M-step for optimizing the network on a contrastive loss.

14.

发明公开
SYSTEMS AND METHODS FOR VIDEO AND LANGUAGE PRE-TRAINING 审中-公开

公开(公告)号：US20230154188A1

公开(公告)日：2023-05-18

申请号：US17566173

申请日：2021-12-30

Applicant: salesforce.com, inc.

Inventor： Dongxu Li , Junnan Li , Chu Hong Hoi

IPC: G06V20/40 , G06V10/74 , G06V10/26 , G06V10/80 , G06F40/284

CPC classification number: G06V20/41 , G06V10/761 , G06V20/47 , G06V10/26 , G06V10/806 , G06F40/284

Abstract: Embodiments described a method of video-text pre-learning to effectively learn cross-modal representations from sparse video frames and text. Specifically, an align and prompt framework provides a video and language pre-training framework that encodes the frames and text independently using a transformer-based video encoder and a text encoder. A multi-modal encoder is then employed to capture cross-modal interaction between a plurality of video frames and a plurality of texts. The pre-training includes a prompting entity modeling that enables the model to capture fine-grained region-entity alignment.

15.

发明公开
SYSTEMS AND METHODS FOR VISION-LANGUAGE DISTRIBUTION ALIGNMENT 审中-公开

公开(公告)号：US20230162490A1

公开(公告)日：2023-05-25

申请号：US17589725

申请日：2022-01-31

Applicant: salesforce.com, inc.

Inventor： Shu Zhang , Junnan Li , Ran Xu , Caiming Xiong , Chetan Ramaiah

IPC: G06V10/776 , G06V10/74 , G06F40/284 , G06F40/166 , G06F40/126 , G06V10/80 , G06F16/583 , G06F16/56

CPC classification number: G06V10/776 , G06V10/761 , G06F40/284 , G06F40/166 , G06F40/126 , G06V10/806 , G06F16/5846 , G06F16/56

Abstract: Embodiments described herein a CROss-Modal Distribution Alignment (CROMDA) model for vision-language pretraining, which can be used for retrieval downstream tasks. In the CROMDA mode, global cross-modal representations are aligned on each unimodality. Specifically, a uni-modal global similarity between an image/text and the image/text feature queue are computed. A softmax-normalized distribution is then generated based on the computed similarity. The distribution thus takes advantage of property of the global structure of the queue. CROMDA then aligns the two distributions and learns a modal invariant global representation. In this way, CROMDA is able to obtain invariant property in each modality, where images with similar text representations should be similar and vice versa.

16.

发明公开
SYSTEMS AND METHODS FOR VIDEO AND LANGUAGE PRE-TRAINING 审中-公开

公开(公告)号：US20230154146A1

公开(公告)日：2023-05-18

申请号：US17566061

申请日：2021-12-30

Applicant: salesforce.com, inc.

Inventor： Dongxu Li , Junnan Li , Chu Hong Hoi

IPC: G06V10/74 , G06V10/774 , G06F40/279 , G06V20/40 , G06V10/776

CPC classification number: G06V10/761 , G06V10/774 , G06F40/279 , G06V20/47 , G06V20/41 , G06V10/776 , G06V20/46

Abstract: Embodiments described a method of video-text pre-learning to effectively learn cross-modal representations from sparse video frames and text. Specifically, an align and prompt framework provides a video and language pre-training framework that encodes the frames and text independently using a transformer-based video encoder and a text encoder. A multi-modal encoder is then employed to capture cross-modal interaction between a plurality of video frames and a plurality of texts. The pre-training includes a prompting entity modeling that enables the model to capture fine-grained region-entity alignment.

17.

发明授权
Unsupervised representation learning with contrastive prototypes 有权

公开(公告)号：US11263476B2

公开(公告)日：2022-03-01

申请号：US16870621

申请日：2020-05-08

Applicant: salesforce.com, inc.

Inventor： Junnan Li , Chu Hong Hoi

IPC: G06K9/62 , G06T7/73

Abstract: The system and method are directed to a prototypical contrastive learning (PCL). The PCL explicitly encodes the hierarchical semantic structure of the dataset into the learned embedding space and prevents the network from exploiting low-level cues for solving the unsupervised learning task. The PCL includes prototypes as the latent variables to help find the maximum-likelihood estimation of the network parameters in an expectation-maximization framework. The PCL iteratively performs an E-step for finding prototypes with clustering and M-step for optimizing the network on a contrastive loss.

18.

发明申请
SYSTEMS AND METHODS FOR NOISE-ROBUST CONTRASTIVE LEARNING 有权

公开(公告)号：US20210374553A1

公开(公告)日：2021-12-02

申请号：US17015858

申请日：2020-09-09

Applicant: salesforce.com, inc.

Inventor： Junnan Li , Chu Hong Hoi

IPC: G06N3/08 , G06N3/04

Abstract: Embodiments described herein provide systems and methods for noise-robust contrastive learning. In view of the need for a noise-robust learning system, embodiments described herein provides a contrastive learning mechanism that combats noise by learning robust representations of the noisy data samples. Specifically, the training images are projected into a low-dimensional subspace, and the geometric structure of the subspace is regularized with: (1) a consistency contrastive loss that enforces images with perturbations to have similar embeddings; and (2) a prototypical contrastive loss augmented with a predetermined learning principle, which encourages the embedding for a linearly-interpolated input to have the same linear relationship with respect to the class prototypes. The low-dimensional embeddings are also trained to reconstruct the high-dimensional features, which preserves the learned information and regularizes the classifier.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification