-
公开(公告)号:US11481552B2
公开(公告)日:2022-10-25
申请号:US17011939
申请日:2020-09-03
Applicant: salesforce.com, inc.
Inventor: Ben Krause , Akhilesh Deepak Gotmare
IPC: G06F40/274 , G06F40/284 , G10L15/183 , G06F40/216
Abstract: The embodiments describe a generative-discriminative (GeDi) language modeling for determining a next token in a text sequence. A class conditional language model and a positive control code determine a first class conditional probability for each token candidate. The class conditional language model and a negative control code determine a second class conditional probability for the each token candidate. A logarithmic probability difference between the first class conditional probability and the second class conditional probability is determined for each token candidate. An unconditional language model determines an unconditional probability for each token candidate. A combined probability is determined by combining the unconditional probability and the logarithmic probability difference for each token candidate. The next token is selected from the token candidates based on the combined probabilities of the token candidates.
-
公开(公告)号:US20230110719A1
公开(公告)日:2023-04-13
申请号:US17589623
申请日:2022-01-31
Applicant: salesforce.com, inc.
Inventor: Ben Krause , Ali Madani
Abstract: Embodiments are directed to finetuning a pre-trained language model using generative fitness finetuning. The generative fitness finetuning reuses a probability distribution learned during unsupervised training of the pre-trained language model to finetune and assay labeled data. The generative fitness finetuning trains the language model to classify a relative fitness of protein sequence pairs based on the corresponding probability of the protein sequences in the pairs. The generative fitness finetuning identifies protein sequences in the pairs with a higher probability as also having higher fitness. The trained and finetuned language model identifies fitness of a protein sequence.
-
公开(公告)号:US20210374341A1
公开(公告)日:2021-12-02
申请号:US17011939
申请日:2020-09-03
Applicant: salesforce.com, inc.
Inventor: Ben Krause , Akhilesh Deepak Gotmare
IPC: G06F40/274 , G06F40/284 , G06F40/216
Abstract: The embodiments describe a generative-discriminative (GeDi) language modeling for determining a next token in a text sequence. A class conditional language model and a positive control code determine a first class conditional probability for each token candidate. The class conditional language model and a negative control code determine a second class conditional probability for the each token candidate. A logarithmic probability difference between the first class conditional probability and the second class conditional probability is determined for each token candidate. An unconditional language model determines an unconditional probability for each token candidate. A combined probability is determined by combining the unconditional probability and the logarithmic probability difference for each token candidate. The next token is selected from the token candidates based on the combined probabilities of the token candidates.
-
-