SYSTEMS AND METHODS FOR NATURAL LANGUAGE CODE SEARCH

    公开(公告)号:US20230109681A1

    公开(公告)日:2023-04-13

    申请号:US17587984

    申请日:2022-01-28

    Abstract: Embodiments are directed to translating a natural language query into a code snippet in a programing language that semantically represents the query. The embodiments include a cascading neural network that includes an encoder network and a classifier network. The encoder network being faster but less accurate than the classifier network. The encoder network is trained using a contrastive learning framework to identify code candidates from a large set of code snippets. The classifier network is trained using a binary classifier to identify the code snippet that semantically represents the query from the code candidates.

    SYSTEMS AND METHODS FOR VISION-AND-LANGUAGE REPRESENTATION LEARNING

    公开(公告)号:US20220391755A1

    公开(公告)日:2022-12-08

    申请号:US17370524

    申请日:2021-07-08

    Abstract: Embodiments described herein provide visual-and-language (V+L) systems and methods for learning vision and language representations. Specifically, a method may comprise receiving a training dataset comprising a plurality of image samples and a plurality of text samples; encoding the plurality of image samples into a plurality of encoded image samples and the plurality of text samples into a plurality of encoded text samples; computing a first loss objective based on the plurality of encoded image samples and the plurality of encoded text samples; encoding a first subset of the plurality of encoded image samples and a second subset of the plurality of encoded text samples into a plurality of encoded image-text samples; computing a second loss objective based on the plurality of encoded image-text samples; and updating the V+L model based at least in part on the first loss objective and the second loss objective.

    SYSTEMS AND METHODS FOR VIDEO REPRESENTATION LEARNING WITH A WEAK TEACHER

    公开(公告)号:US20220156593A1

    公开(公告)日:2022-05-19

    申请号:US17219339

    申请日:2021-03-31

    Abstract: Embodiments described herein provide systems and methods for learning representation from unlabeled videos. Specifically, a method may comprise generating a set of strongly-augmented samples and a set of weakly-augmented samples from the unlabeled video samples; generating a set of predictive logits by inputting the set of strongly-augmented samples into a student model and a first teacher model; generating a set of artificial labels by inputting the set of weakly-augmented samples to a second teacher model that operates in parallel to the first teacher model, wherein the second teacher model shares one or more model parameters with the first teacher model; computing a loss objective based on the set of predictive logits and the set of artificial labels; updating student model parameters based on the loss objective via backpropagation; and updating the shared parameters for the first teacher model and the second teacher model based on the updated student model parameters.

    System and method for learning with noisy labels as semi-supervised learning

    公开(公告)号:US11599792B2

    公开(公告)日:2023-03-07

    申请号:US16688104

    申请日:2019-11-19

    Abstract: A method provides learning with noisy labels. The method includes generating a first network of a machine learning model with a first set of parameter initial values, and generating a second network of the machine learning model with a second set of parameter initial values. First clean probabilities for samples in a training dataset are generated using the second network. A first labeled dataset and a first unlabeled dataset are generated from the training dataset based on the first clean probabilities. The first network is trained based on the first labeled dataset and first unlabeled dataset to update parameters of the first network.

    Noise-resistant object detection with noisy annotations

    公开(公告)号:US11334766B2

    公开(公告)日:2022-05-17

    申请号:US16778339

    申请日:2020-01-31

    Abstract: Systems and methods are provided for training object detectors of a neural network model with a mixture of label noise and bounding box noise. According to some embodiments, a learning framework is provided which jointly optimizes object labels, bounding box coordinates, and model parameters by performing alternating noise correction and model training. In some embodiments, to disentangle label noise and bounding box noise, a two-step noise correction method is employed. In some examples, the first step performs class-agnostic bounding box correction by minimizing classifier discrepancy and maximizing region objectness. In some examples, the second step uses dual detection heads for label correction and class-specific bounding box refinement.

    SYSTEMS AND METHODS FOR PARTIALLY SUPERVISED LEARNING WITH MOMENTUM PROTOTYPES

    公开(公告)号:US20220067506A1

    公开(公告)日:2022-03-03

    申请号:US17005763

    申请日:2020-08-28

    Abstract: A learning mechanism with partially-labeled web images is provided while correcting the noise labels during the learning. Specifically, the mechanism employs a momentum prototype that represents common characteristics of a specific class. One training objective is to minimize the difference between the normalized embedding of a training image sample and the momentum prototype of the corresponding class. Meanwhile, during the training process, the momentum prototype is used to generate a pseudo label for the training image sample, which can then be used to identify and remove out of distribution (OOD) samples to correct the noisy labels from the original partially-labeled training images. The momentum prototype for each class is in turn constantly updated based on the embeddings of new training samples and their pseudo labels.

    SYSTEMS AND METHODS FOR SEMI-SUPERVISED LEARNING WITH CONTRASTIVE GRAPH REGULARIZATION

    公开(公告)号:US20220156591A1

    公开(公告)日:2022-05-19

    申请号:US17160896

    申请日:2021-01-28

    Abstract: Embodiments described herein provide an approach (referred to as “Co-training” mechanism throughout this disclosure) that jointly learns two representations of the training data, their class probabilities and low-dimensional embeddings. Specifically, two representations of each image sample are generated: a class probability produced by the classification head and a low-dimensional embedding produced by the projection head. The classification head is trained using memory-smoothed pseudo-labels, where pseudo-labels are smoothed by aggregating information from nearby samples in the embedding space. The projection head is trained using contrastive learning on a pseudo-label graph, where samples with similar pseudo-labels are encouraged to have similar embeddings.

    SYSTEMS AND METHODS FOR INTERPOLATIVE CENTROID CONTRASTIVE LEARNING

    公开(公告)号:US20220156530A1

    公开(公告)日:2022-05-19

    申请号:US17188232

    申请日:2021-03-01

    Abstract: An interpolative centroid contrastive learning (ICCL) framework is disclosed for learning a more discriminative representation for tail classes. Specifically, data samples, such as natural images, are projected into a low-dimensional embedding space, and class centroids for respective classes are created as average embeddings of samples that belong to a respective class. Virtual training samples are then created by interpolating two images from two samplers: a class-agnostic sampler which returns all images from both the head class and the tail class with an equal probability, and a class-aware sampler which focuses more on tail-class images by sampling images from the tail class with a higher probability compared to images from the head class. The sampled images, e.g., images from the class-agnostic sampler and images from the class-aware sampler may be interpolated to generate interpolated images.

    UNSUPERVISED REPRESENTATION LEARNING WITH CONTRASTIVE PROTOTYPES

    公开(公告)号:US20220156507A1

    公开(公告)日:2022-05-19

    申请号:US17591121

    申请日:2022-02-02

    Abstract: The system and method are directed to a prototypical contrastive learning (PCL). The PCL explicitly encodes the hierarchical semantic structure of the dataset into the learned embedding space and prevents the network from exploiting low-level cues for solving the unsupervised learning task. The PCL includes prototypes as the latent variables to help find the maximum-likelihood estimation of the network parameters in an expectation-maximization framework. The PCL iteratively performs an E-step for finding prototypes with clustering and M-step for optimizing the network on a contrastive loss.

Patent Agency Ranking