-
公开(公告)号:US12210976B2
公开(公告)日:2025-01-28
申请号:US17219339
申请日:2021-03-31
Applicant: Salesforce.com, Inc.
Inventor: Hualin Liu , Chu Hong Hoi , Junnan Li
IPC: G06N3/084 , G06F18/214 , G06F18/22 , G06N3/088 , G06V10/75
Abstract: Embodiments described herein provide systems and methods for learning representation from unlabeled videos. Specifically, a method may comprise generating a set of strongly-augmented samples and a set of weakly-augmented samples from the unlabeled video samples; generating a set of predictive logits by inputting the set of strongly-augmented samples into a student model and a first teacher model; generating a set of artificial labels by inputting the set of weakly-augmented samples to a second teacher model that operates in parallel to the first teacher model, wherein the second teacher model shares one or more model parameters with the first teacher model; computing a loss objective based on the set of predictive logits and the set of artificial labels; updating student model parameters based on the loss objective via backpropagation; and updating the shared parameters for the first teacher model and the second teacher model based on the updated student model parameters.
-
公开(公告)号:US20230376746A1
公开(公告)日:2023-11-23
申请号:US17939085
申请日:2022-09-07
Applicant: Salesforce.com, inc.
Inventor: Gerald Woo , Chenghao Liu , Doyen Sahoo , Chu Hong Hoi
CPC classification number: G06N3/08 , G06N3/0481
Abstract: Embodiments described herein provide a time-index model for forecasting time-series data. The architecture of the model takes a normalized time index as an input, uses a model, g_φ, to produce a vector representation of the time-index, and uses a “ridge regressor” which takes the vector representation and provides an estimated value. The model may be trained on a time-series dataset. The ridge regressor is trained for a given g_φ to reproduce a given lookback window. g_φ is trained over time-indexes in a horizon window, such that g_φ and the corresponding ridge regressor will accurately predict the data in the horizon window. Once g_φ is sufficiently trained, the ridge regressor can be updated based on that final g_φ over a lookback window comprising the time-indexes with the last known values. The final g_φ together with the updated ridge regressor can be given time-indexes past the known values, thereby providing forecasted values.
-
公开(公告)号:US11776236B2
公开(公告)日:2023-10-03
申请号:US17591121
申请日:2022-02-02
Applicant: salesforce.com, inc.
Inventor: Junnan Li , Chu Hong Hoi
IPC: G06K9/62 , G06V10/44 , G06T7/73 , G06F18/23 , G06F18/214 , G06V10/762 , G06V10/774 , G06V10/776 , G06V10/82
CPC classification number: G06V10/454 , G06F18/2155 , G06F18/23 , G06T7/73 , G06V10/763 , G06V10/776 , G06V10/7753 , G06V10/82 , G06T2207/20084
Abstract: The system and method are directed to a prototypical contrastive learning (PCL). The PCL explicitly encodes the hierarchical semantic structure of the dataset into the learned embedding space and prevents the network from exploiting low-level cues for solving the unsupervised learning task. The PCL includes prototypes as the latent variables to help find the maximum-likelihood estimation of the network parameters in an expectation-maximization framework. The PCL iteratively performs an E-step for finding prototypes with clustering and M-step for optimizing the network on a contrastive loss.
-
公开(公告)号:US20230154188A1
公开(公告)日:2023-05-18
申请号:US17566173
申请日:2021-12-30
Applicant: salesforce.com, inc.
Inventor: Dongxu Li , Junnan Li , Chu Hong Hoi
IPC: G06V20/40 , G06V10/74 , G06V10/26 , G06V10/80 , G06F40/284
CPC classification number: G06V20/41 , G06V10/761 , G06V20/47 , G06V10/26 , G06V10/806 , G06F40/284
Abstract: Embodiments described a method of video-text pre-learning to effectively learn cross-modal representations from sparse video frames and text. Specifically, an align and prompt framework provides a video and language pre-training framework that encodes the frames and text independently using a transformer-based video encoder and a text encoder. A multi-modal encoder is then employed to capture cross-modal interaction between a plurality of video frames and a plurality of texts. The pre-training includes a prompting entity modeling that enables the model to capture fine-grained region-entity alignment.
-
公开(公告)号:US11599730B2
公开(公告)日:2023-03-07
申请号:US16870568
申请日:2020-05-08
Applicant: salesforce.com, inc.
Inventor: Chien-Sheng Wu , Chu Hong Hoi , Caiming Xiong
Abstract: Embodiments described in this disclosure illustrate the use of self-/semi supervised approaches for label-efficient DST in task-oriented dialogue systems. Conversational behavior is modeled by next response generation and turn utterance generation tasks. Prediction consistency is strengthened by augmenting data with stochastic word dropout and label guessing. Experimental results show that by exploiting self-supervision the joint goal accuracy can be boosted with limited labeled data.
-
公开(公告)号:US11562147B2
公开(公告)日:2023-01-24
申请号:US16929738
申请日:2020-07-15
Applicant: salesforce.com, inc.
Inventor: Yue Wang , Chu Hong Hoi , Shafiq Rayhan Joty
Abstract: A visual dialogue model receives image input and text input that includes a dialogue history between the model and a current utterance by a human user. The model generates a unified contextualized representation using a transformer encoder network, in which the unified contextualized representation includes a token level encoding of the image input and text input. The model generates an encoded visual dialogue input from the unified contextualized representation using visual dialogue encoding layers. The encoded visual dialogue input includes a position level encoding and a segment type encoding. The model generates an answer prediction from the encoded visual dialogue input using a first self-attention mask associated with discriminative settings or a second self-attention mask associated with generative settings. Dense annotation fine tuning may be performed to increase accuracy of the answer prediction. The model provides the answer prediction as a response to the current utterance of the human user.
-
公开(公告)号:US20220382527A1
公开(公告)日:2022-12-01
申请号:US17459968
申请日:2021-08-27
Applicant: salesforce.com, inc.
Inventor: Yue Wang , Weishi Wang , Shafiq Rayhan Joty , Chu Hong Hoi
Abstract: Embodiments described herein a code generation and understanding model that builds on a Transformer-based encoder-decoder framework. The code generation and understanding model is configured to derive generic representations for programming language (PL) and natural language (NL) in code domain via pre-training on unlabeled code corpus, and then to benefit many code-related downstream tasks with fine-tuning. Apart from the denoising sequence-to-sequence objectives widely adopted for pre-training on natural language, identifier tagging and prediction pre-training objective is adopted to enable the model to better leverage the crucial token type information from PL, which specifically are the identifiers assigned by developers.
-
公开(公告)号:US20220237403A1
公开(公告)日:2022-07-28
申请号:US17161378
申请日:2021-01-28
Applicant: salesforce.com, inc.
Inventor: Pan Zhou , Peng Tang , Ran Xu , Chu Hong Hoi
Abstract: A system uses a neural network based model to perform scene text recognition. The system achieves high accuracy of prediction of text from scenes based on a neural network architecture that uses double attention mechanism. The neural network based model includes a convolutional neural network component that outputs a set of visual features and an attention extractor neural network component that determines attention scores based on the visual features. The visual features and the attention scores are combined to generate mixed features that are provided as input to a character recognizer component that determines a second attention score and recognizes the characters based on the second attention score. The system trains the neural network based model by adjusting the neural network parameters to minimize a multi-class gradient harmonizing mechanism (GHM) loss. The multi-class GHM loss varies based on a level of difficulty of the sample.
-
公开(公告)号:US20220108688A1
公开(公告)日:2022-04-07
申请号:US17162624
申请日:2021-01-29
Applicant: salesforce.com, inc.
Inventor: Guangsen Wang , Chu Hong Hoi , Genta Indra Winata
IPC: G10L15/16 , G10L15/065 , G10L15/06 , G06N3/04 , G06N3/08
Abstract: Embodiments described herein provide an Adapt-and-Adjust (A2) mechanism for multilingual speech recognition model that combines both adaptation and adjustment methods as an integrated end-to-end training to improve the models' generalization and mitigate the long-tailed issue. Specifically, a multilingual language model mBERT is utilized, and converted into an autoregressive transformer decoder. In addition, a cross-attention module is added to the encoder on top of the mBERT's self-attention layer in order to explore the acoustic space in addition to the text space. The joint training of the encoder and mBERT decoder can bridge the semantic gap between the speech and the text.
-
40.
公开(公告)号:US11836037B2
公开(公告)日:2023-12-05
申请号:US17476892
申请日:2021-09-16
Applicant: salesforce.com, inc.
Inventor: Amrita Saha , Chu Hong Hoi
IPC: G06F11/07
CPC classification number: G06F11/079 , G06F11/0706
Abstract: Some embodiments of the current disclosure disclose methods and systems for analyzing root causes of an incident disrupting information technology services such as cloud services. In some embodiments, a set of problem review board (PRB) documents including information about said incidents may be parsed using a natural language processing (NLP) neural model to extract structured PRB data from the unstructured investigative information contained in the PRB documents. The structured PRB data may include symptoms of the incident, root causes of the incident, resolutions of the incidents, etc., and a causal knowledge graph causally relating the symptoms, root causes, resolutions of the incidents may be generated.
-
-
-
-
-
-
-
-
-