Generative-discriminative language modeling for controllable text generation

    公开(公告)号:US11481552B2

    公开(公告)日:2022-10-25

    申请号:US17011939

    申请日:2020-09-03

    Abstract: The embodiments describe a generative-discriminative (GeDi) language modeling for determining a next token in a text sequence. A class conditional language model and a positive control code determine a first class conditional probability for each token candidate. The class conditional language model and a negative control code determine a second class conditional probability for the each token candidate. A logarithmic probability difference between the first class conditional probability and the second class conditional probability is determined for each token candidate. An unconditional language model determines an unconditional probability for each token candidate. A combined probability is determined by combining the unconditional probability and the logarithmic probability difference for each token candidate. The next token is selected from the token candidates based on the combined probabilities of the token candidates.

    SYSTEMS AND METHODS FOR FEW-SHOT PROTEIN FITNESS PREDICTION WITH GENERATIVE MODELS

    公开(公告)号:US20230110719A1

    公开(公告)日:2023-04-13

    申请号:US17589623

    申请日:2022-01-31

    Abstract: Embodiments are directed to finetuning a pre-trained language model using generative fitness finetuning. The generative fitness finetuning reuses a probability distribution learned during unsupervised training of the pre-trained language model to finetune and assay labeled data. The generative fitness finetuning trains the language model to classify a relative fitness of protein sequence pairs based on the corresponding probability of the protein sequences in the pairs. The generative fitness finetuning identifies protein sequences in the pairs with a higher probability as also having higher fitness. The trained and finetuned language model identifies fitness of a protein sequence.

    GENERATIVE-DISCRIMINATIVE LANGUAGE MODELING FOR CONTROLLABLE TEXT GENERATION

    公开(公告)号:US20210374341A1

    公开(公告)日:2021-12-02

    申请号:US17011939

    申请日:2020-09-03

    Abstract: The embodiments describe a generative-discriminative (GeDi) language modeling for determining a next token in a text sequence. A class conditional language model and a positive control code determine a first class conditional probability for each token candidate. The class conditional language model and a negative control code determine a second class conditional probability for the each token candidate. A logarithmic probability difference between the first class conditional probability and the second class conditional probability is determined for each token candidate. An unconditional language model determines an unconditional probability for each token candidate. A combined probability is determined by combining the unconditional probability and the logarithmic probability difference for each token candidate. The next token is selected from the token candidates based on the combined probabilities of the token candidates.

Patent Agency Ranking