Systems and methods for unsupervised autoregressive text compression

    公开(公告)号:US11487939B2

    公开(公告)日:2022-11-01

    申请号:US16549985

    申请日:2019-08-23

    Abstract: Embodiments described herein provide a provide a fully unsupervised model for text compression. Specifically, the unsupervised model is configured to identify an optimal deletion path for each input sequence of texts (e.g., a sentence) and words from the input sequence are gradually deleted along the deletion path. To identify the optimal deletion path, the unsupervised model may adopt a pretrained bidirectional language model (BERT) to score each candidate deletion based on the average perplexity of the resulting sentence and performs a simple greedy look-ahead tree search to select the best deletion for each step.

    SYSTEMS AND METHODS FOR A K-NEAREST NEIGHBOR BASED MECHANISM OF NATURAL LANGUAGE PROCESSING MODELS

    公开(公告)号:US20210374488A1

    公开(公告)日:2021-12-02

    申请号:US17090553

    申请日:2020-11-05

    Abstract: Embodiments described herein adopts a k nearest neighbor (kNN) mechanism over a model's hidden representations to identify training examples closest to a given test example. Specifically, a training set of sequences and a test sequence are received, each of which is mapped to a respective hidden representation vector using a base model. A set of indices for each sequence index that minimizes a distance between the respective hidden state vector and a test hidden state vector is then determined A weighted k-nearest neighbor probability score can then be computed from the set of indices to generate a probability distribution over labels for the test sequence.

Patent Agency Ranking