Systems and methods for video representation learning with a weak teacher

    公开(公告)号:US12210976B2

    公开(公告)日:2025-01-28

    申请号:US17219339

    申请日:2021-03-31

    Abstract: Embodiments described herein provide systems and methods for learning representation from unlabeled videos. Specifically, a method may comprise generating a set of strongly-augmented samples and a set of weakly-augmented samples from the unlabeled video samples; generating a set of predictive logits by inputting the set of strongly-augmented samples into a student model and a first teacher model; generating a set of artificial labels by inputting the set of weakly-augmented samples to a second teacher model that operates in parallel to the first teacher model, wherein the second teacher model shares one or more model parameters with the first teacher model; computing a loss objective based on the set of predictive logits and the set of artificial labels; updating student model parameters based on the loss objective via backpropagation; and updating the shared parameters for the first teacher model and the second teacher model based on the updated student model parameters.

    SYSTEMS AND METHODS FOR NON-STATIONARY TIME-SERIES FORECASTING

    公开(公告)号:US20230376746A1

    公开(公告)日:2023-11-23

    申请号:US17939085

    申请日:2022-09-07

    CPC classification number: G06N3/08 G06N3/0481

    Abstract: Embodiments described herein provide a time-index model for forecasting time-series data. The architecture of the model takes a normalized time index as an input, uses a model, g_φ, to produce a vector representation of the time-index, and uses a “ridge regressor” which takes the vector representation and provides an estimated value. The model may be trained on a time-series dataset. The ridge regressor is trained for a given g_φ to reproduce a given lookback window. g_φ is trained over time-indexes in a horizon window, such that g_φ and the corresponding ridge regressor will accurately predict the data in the horizon window. Once g_φ is sufficiently trained, the ridge regressor can be updated based on that final g_φ over a lookback window comprising the time-indexes with the last known values. The final g_φ together with the updated ridge regressor can be given time-indexes past the known values, thereby providing forecasted values.

    Unified vision and dialogue transformer with BERT

    公开(公告)号:US11562147B2

    公开(公告)日:2023-01-24

    申请号:US16929738

    申请日:2020-07-15

    Abstract: A visual dialogue model receives image input and text input that includes a dialogue history between the model and a current utterance by a human user. The model generates a unified contextualized representation using a transformer encoder network, in which the unified contextualized representation includes a token level encoding of the image input and text input. The model generates an encoded visual dialogue input from the unified contextualized representation using visual dialogue encoding layers. The encoded visual dialogue input includes a position level encoding and a segment type encoding. The model generates an answer prediction from the encoded visual dialogue input using a first self-attention mask associated with discriminative settings or a second self-attention mask associated with generative settings. Dense annotation fine tuning may be performed to increase accuracy of the answer prediction. The model provides the answer prediction as a response to the current utterance of the human user.

    SYSTEMS AND METHODS FOR CODE UNDERSTANDING AND GENERATION

    公开(公告)号:US20220382527A1

    公开(公告)日:2022-12-01

    申请号:US17459968

    申请日:2021-08-27

    Abstract: Embodiments described herein a code generation and understanding model that builds on a Transformer-based encoder-decoder framework. The code generation and understanding model is configured to derive generic representations for programming language (PL) and natural language (NL) in code domain via pre-training on unlabeled code corpus, and then to benefit many code-related downstream tasks with fine-tuning. Apart from the denoising sequence-to-sequence objectives widely adopted for pre-training on natural language, identifier tagging and prediction pre-training objective is adopted to enable the model to better leverage the crucial token type information from PL, which specifically are the identifiers assigned by developers.

    NEURAL NETWORK BASED SCENE TEXT RECOGNITION

    公开(公告)号:US20220237403A1

    公开(公告)日:2022-07-28

    申请号:US17161378

    申请日:2021-01-28

    Abstract: A system uses a neural network based model to perform scene text recognition. The system achieves high accuracy of prediction of text from scenes based on a neural network architecture that uses double attention mechanism. The neural network based model includes a convolutional neural network component that outputs a set of visual features and an attention extractor neural network component that determines attention scores based on the visual features. The visual features and the attention scores are combined to generate mixed features that are provided as input to a character recognizer component that determines a second attention score and recognizes the characters based on the second attention score. The system trains the neural network based model by adjusting the neural network parameters to minimize a multi-class gradient harmonizing mechanism (GHM) loss. The multi-class GHM loss varies based on a level of difficulty of the sample.

    SYSTEMS AND METHODS FOR A MULTILINGUAL SPEECH RECOGNITION FRAMEWORK

    公开(公告)号:US20220108688A1

    公开(公告)日:2022-04-07

    申请号:US17162624

    申请日:2021-01-29

    Abstract: Embodiments described herein provide an Adapt-and-Adjust (A2) mechanism for multilingual speech recognition model that combines both adaptation and adjustment methods as an integrated end-to-end training to improve the models' generalization and mitigate the long-tailed issue. Specifically, a multilingual language model mBERT is utilized, and converted into an autoregressive transformer decoder. In addition, a cross-attention module is added to the encoder on top of the mBERT's self-attention layer in order to explore the acoustic space in addition to the text space. The joint training of the encoder and mBERT decoder can bridge the semantic gap between the speech and the text.

    Systems and methods for artificial intelligence-based root cause analysis of service incidents

    公开(公告)号:US11836037B2

    公开(公告)日:2023-12-05

    申请号:US17476892

    申请日:2021-09-16

    CPC classification number: G06F11/079 G06F11/0706

    Abstract: Some embodiments of the current disclosure disclose methods and systems for analyzing root causes of an incident disrupting information technology services such as cloud services. In some embodiments, a set of problem review board (PRB) documents including information about said incidents may be parsed using a natural language processing (NLP) neural model to extract structured PRB data from the unstructured investigative information contained in the PRB documents. The structured PRB data may include symptoms of the incident, root causes of the incident, resolutions of the incidents, etc., and a causal knowledge graph causally relating the symptoms, root causes, resolutions of the incidents may be generated.

Patent Agency Ranking