Patent search ap:("salesforce.com Page inc.") AND inv:"Dongxu Li"

1.

发明公开
SYSTEMS AND METHODS FOR VIDEO AND LANGUAGE PRE-TRAINING 审中-公开

公开(公告)号：US20230154188A1

公开(公告)日：2023-05-18

申请号：US17566173

申请日：2021-12-30

Applicant: salesforce.com, inc.

Inventor： Dongxu Li , Junnan Li , Chu Hong Hoi

IPC: G06V20/40 , G06V10/74 , G06V10/26 , G06V10/80 , G06F40/284

CPC classification number: G06V20/41 , G06V10/761 , G06V20/47 , G06V10/26 , G06V10/806 , G06F40/284

Abstract: Embodiments described a method of video-text pre-learning to effectively learn cross-modal representations from sparse video frames and text. Specifically, an align and prompt framework provides a video and language pre-training framework that encodes the frames and text independently using a transformer-based video encoder and a text encoder. A multi-modal encoder is then employed to capture cross-modal interaction between a plurality of video frames and a plurality of texts. The pre-training includes a prompting entity modeling that enables the model to capture fine-grained region-entity alignment.

2.

发明公开
SYSTEMS AND METHODS FOR VIDEO AND LANGUAGE PRE-TRAINING 审中-公开

公开(公告)号：US20230154146A1

公开(公告)日：2023-05-18

申请号：US17566061

申请日：2021-12-30

Applicant: salesforce.com, inc.

Inventor： Dongxu Li , Junnan Li , Chu Hong Hoi

IPC: G06V10/74 , G06V10/774 , G06F40/279 , G06V20/40 , G06V10/776

CPC classification number: G06V10/761 , G06V10/774 , G06F40/279 , G06V20/47 , G06V20/41 , G06V10/776 , G06V20/46

Abstract: Embodiments described a method of video-text pre-learning to effectively learn cross-modal representations from sparse video frames and text. Specifically, an align and prompt framework provides a video and language pre-training framework that encodes the frames and text independently using a transformer-based video encoder and a text encoder. A multi-modal encoder is then employed to capture cross-modal interaction between a plurality of video frames and a plurality of texts. The pre-training includes a prompting entity modeling that enables the model to capture fine-grained region-entity alignment.

Patent Agency Ranking