-
公开(公告)号:US12235850B2
公开(公告)日:2025-02-25
申请号:US17588022
申请日:2022-01-28
Applicant: Salesforce, Inc.
Inventor: Luyu Yang , Mingfei Gao , Zeyuan Chen , Ran Xu , Chetan Ramaiah
IPC: G06F16/2455 , G06F16/242 , G06N20/00
Abstract: Embodiments described herein provide an online domain adaptation framework based on cross-domain bootstrapping for online domain adaptation, in which the target domain streaming data is deleted immediately after adapted. At each online query, the data diversity is increased across domains by bootstrapping the source domain to form diverse combinations with the current target query. To fully take advantage of the valuable discrepancies among the diverse combinations, a set of independent learners are trained to preserve the differences. The knowledge of the learners is then integrated by exchanging their predicted pseudo-labels on the current target query to co-supervise the learning on the target domain, but without sharing the weights to maintain the learners' divergence.
-
公开(公告)号:US20250045567A1
公开(公告)日:2025-02-06
申请号:US18498257
申请日:2023-10-31
Applicant: Salesforce, Inc.
Inventor: Weiran Yao , Shelby Heinecke , Juan Carlos Niebles Duque , Zhiwei Liu , Yihao Feng , Le Xue , Rithesh Murthy , Zeyuan Chen , Jianguo Zhang , Devansh Arpit , Ran Xu , Lik Mui , Huan Wang , Caiming Xiong , Silvio Savarese
IPC: G06N3/0455 , G06N3/092
Abstract: Embodiments described herein provide for optimizing a language model (LM) agent. In at least one embodiment, and LM agent comprises an “actor” LM and a “retrospective LM which provides reflections on attempts by the actor LM. The reflections are used to update subsequent prompts to the actor LM. Optimizing the LM agent comprises fine-tuning parameters of the retrospective LM while keeping parameters of the actor LM frozen. A gradient may be determined by a change in reward from the environment based on actions taken by the actor LM with and without a reflection of the retrospective LM. Using this gradient, parameters of the retrospective LM may be updated via backpropagation.
-
公开(公告)号:US20240338961A1
公开(公告)日:2024-10-10
申请号:US18746820
申请日:2024-06-18
Applicant: Salesforce, Inc.
Inventor: Mingfei Gao , Ran Xu
IPC: G06V30/412 , G06F40/174 , G06F40/205 , G06N20/00 , G06V30/19
CPC classification number: G06V30/412 , G06F40/174 , G06F40/205 , G06N20/00 , G06V30/19007
Abstract: An application server may receive an input document including a set of input text fields and an input key phrase querying a value for a key-value pair that corresponds to one or more of the set of input text fields. The application server may extract, using an optical character recognition model, a set of character strings and a set of two-dimensional locations of the set of character strings on a layout of the input document. After extraction, the application server may input the extracted set of character strings and the set of two-dimensional locations into a machine learned model that is trained to compute a probability that a character string corresponds to the value for the key-value pair. The application server may then identify the value for the key-value pair corresponding to the input key phrase and may out the identified value.
-
公开(公告)号:US12112523B2
公开(公告)日:2024-10-08
申请号:US17589725
申请日:2022-01-31
Applicant: Salesforce, Inc.
Inventor: Shu Zhang , Junnan Li , Ran Xu , Caiming Xiong , Chetan Ramaiah
IPC: G06V10/776 , G06F16/56 , G06F16/583 , G06F40/126 , G06F40/166 , G06F40/284 , G06V10/74 , G06V10/80
CPC classification number: G06V10/776 , G06F16/56 , G06F16/5846 , G06F40/126 , G06F40/166 , G06F40/284 , G06V10/761 , G06V10/806
Abstract: Embodiments described herein a CROss-Modal Distribution Alignment (CROMDA) model for vision-language pretraining, which can be used for retrieval downstream tasks. In the CROMDA mode, global cross-modal representations are aligned on each unimodality. Specifically, a uni-modal global similarity between an image/text and the image/text feature queue are computed. A softmax-normalized distribution is then generated based on the computed similarity. The distribution thus takes advantage of property of the global structure of the queue. CROMDA then aligns the two distributions and learns a modal invariant global representation. In this way, CROMDA is able to obtain invariant property in each modality, where images with similar text representations should be similar and vice versa.
-
公开(公告)号:US12299982B2
公开(公告)日:2025-05-13
申请号:US16931228
申请日:2020-07-16
Applicant: Salesforce, Inc.
Inventor: Mingfei Gao , Yingbo Zhou , Ran Xu , Caiming Xiong
IPC: G06V20/40 , G06F17/18 , G06F18/2113 , G06F18/214 , G06F18/2431 , G06N3/084 , G06V10/764 , G06V10/82 , G06V20/20
Abstract: Embodiments described herein provide systems and methods for a partially supervised training model for online action detection. Specifically, the online action detection framework may include two modules that are trained jointly—a Temporal Proposal Generator (TPG) and an Online Action Recognizer (OAR). In the training phase, OAR performs both online per-frame action recognition and start point detection. At the same time, TPG generates class-wise temporal action proposals serving as noisy supervisions for OAR. TPG is then optimized with the video-level annotations. In this way, the online action detection framework can be trained with video-category labels only without pre-annotated segment-level boundary labels.
-
公开(公告)号:US20250139411A1
公开(公告)日:2025-05-01
申请号:US18498229
申请日:2023-10-31
Applicant: Salesforce, Inc.
Inventor: Rithesh Murthy , Shelby Heinecke , Juan Carlos Niebles Duque , Zhiwei Liu , Le Xue , Weiran Yao , Yihao Feng , Zeyuan Chen , Akash Gokul , Devansh Arpit , Ran Xu , Lik Mui , Huan Wang , Caiming Xiong , Silvio Savarese
IPC: G06N3/0455 , G06N3/084
Abstract: Embodiments described herein provide a large language model (LLM) based AI agent that adopts Monte-Carlo Tree Search (MCTS) to execute a task. The LLM is prompted with a task description and it responds with its first attempted list of actions. Based on the success or failure of the first attempt, the LLM is prompted with an updated prompt which includes feedback from the first attempt based on a determined reward. The prompt may include a relative “score” for each action taken at each step. A numeric score may be mapped to a set of pre-defined text labels, such as “high” or “low” value putting the score in a form more suited for an LLM prompt. In this way, the LLM is iteratively given prompts which are updated with the scores from each action taken at each previous iterations so that it traverses different paths on the tree in each iteration.
-
公开(公告)号:US12086698B2
公开(公告)日:2024-09-10
申请号:US17484618
申请日:2021-09-24
Applicant: Salesforce, Inc.
Inventor: Mingfei Gao , Zeyuan Chen , Ran Xu
IPC: G06N20/20 , G06N3/084 , G06N5/01 , G06N5/04 , G06V30/412 , G06V30/413
CPC classification number: G06N20/20 , G06N3/084 , G06N5/01 , G06N5/04 , G06V30/412 , G06V30/413
Abstract: A field extraction system that does not require field-level annotations for training is provided. Specifically, the training process is bootstrapped by mining pseudo-labels from unlabeled forms using simple rules. Then, a transformer-based structure is used to model interactions between text tokens in the input form and predict a field tag for each token accordingly. The pseudo-labels are used to supervise the transformer training. As the pseudo-labels are noisy, a refinement module that contains a sequence of branches is used to refine the pseudo-labels. Each of the refinement branches conducts field tagging and generates refined labels. At each stage, a branch is optimized by the labels ensembled from all previous branches to reduce label noise.
-
公开(公告)号:US20250068901A1
公开(公告)日:2025-02-27
申请号:US18423081
申请日:2024-01-25
Applicant: Salesforce, Inc.
Inventor: Shiyu Wang , Yihao Feng , Tian Lan , Ning Yu , Yu Bai , Ran Xu , Huan Wang , Caiming Xiong , Silvio Savarese
IPC: G06N3/08
Abstract: Embodiments described herein provide a diffusion-based framework that is trained on a dataset with limited text labels, to generate a distribution of data samples in the dataset given a specific text description label. Specifically, firstly, unlabeled data is used to train the diffusion model to generate a data distribution of data samples given a specific text description label. Then text-labeled data samples are used to finetune the diffusion model to generate data distribution given a specific text description label, thus enhancing controllability of training.
-
公开(公告)号:US20240169746A1
公开(公告)日:2024-05-23
申请号:US18161661
申请日:2023-01-30
Applicant: Salesforce, Inc.
Inventor: Manli Shu , Le Xue , Ning Yu , Roberto Martín-Martín , Juan Carlos Niebles Duque , Caiming Xiong , Ran Xu
CPC classification number: G06V20/64 , G06T3/4007 , G06V10/46 , G06V10/82
Abstract: Embodiments described herein provide a system for three-dimensional (3D) object detection. The system includes an input interface configured to obtain 3D point data describing spatial information of a plurality of points, and a memory storing a neural network based 3D object detection model having an encoder and a decoder. The system also includes processors to perform operations including: encoding, by the encoder, a first set of coordinates into a first set of point features and a set of object features; sampling a second set of point features from the first set of point features; generating, by attention layers at the decoder, a set of attention weights by applying cross-attention over at least the set of object features and the second set of point feature, and generate, by the decoder, a predicted bounding box among the plurality of points based on at least in part on the set of attention weights.
-
公开(公告)号:US20240070868A1
公开(公告)日:2024-02-29
申请号:US18159318
申请日:2023-01-25
Applicant: Salesforce, Inc.
Inventor: Ning Yu , Vibashan Vishnukumar Sharmini , Chen Xing , Juan Carlos Niebles Duque , Ran Xu
CPC classification number: G06T7/11 , G06V10/273
Abstract: Embodiments described herein provide an open-vocabulary instance segmentation framework that adopts a pre-trained vision-language model to develop a pipeline in detecting novel categories of instances.
-
-
-
-
-
-
-
-
-