-
公开(公告)号:US20220374595A1
公开(公告)日:2022-11-24
申请号:US17531591
申请日:2021-11-19
Applicant: salesforce.com, inc.
Inventor: Akhilesh Deepak Gotmare , Junnan Li , Shafiq Rayhan Joty , Chu Hong Hoi
IPC: G06F40/226 , G06F40/40 , G06F40/30 , G06F40/151
Abstract: Embodiments described herein provides a contrastive learning framework that leverages hard negative examples, that are mined globally from the entire training corpus for a given query to improve the quality of code and natural language representations. Specifically, similar examples from the training corpus are extracted and used as hard negatives in an online manner during training while keeping the minibatch construction random.
-
公开(公告)号:US20210374341A1
公开(公告)日:2021-12-02
申请号:US17011939
申请日:2020-09-03
Applicant: salesforce.com, inc.
Inventor: Ben Krause , Akhilesh Deepak Gotmare
IPC: G06F40/274 , G06F40/284 , G06F40/216
Abstract: The embodiments describe a generative-discriminative (GeDi) language modeling for determining a next token in a text sequence. A class conditional language model and a positive control code determine a first class conditional probability for each token candidate. The class conditional language model and a negative control code determine a second class conditional probability for the each token candidate. A logarithmic probability difference between the first class conditional probability and the second class conditional probability is determined for each token candidate. An unconditional language model determines an unconditional probability for each token candidate. A combined probability is determined by combining the unconditional probability and the logarithmic probability difference for each token candidate. The next token is selected from the token candidates based on the combined probabilities of the token candidates.
-
公开(公告)号:US20230109681A1
公开(公告)日:2023-04-13
申请号:US17587984
申请日:2022-01-28
Applicant: salesforce.com, inc.
Inventor: Akhilesh Deepak Gotmare , Junnan Li , Chu Hong Hoi
IPC: G06F40/151 , G06F40/30 , G06F40/40 , G06N3/04
Abstract: Embodiments are directed to translating a natural language query into a code snippet in a programing language that semantically represents the query. The embodiments include a cascading neural network that includes an encoder network and a classifier network. The encoder network being faster but less accurate than the classifier network. The encoder network is trained using a contrastive learning framework to identify code candidates from a large set of code snippets. The classifier network is trained using a binary classifier to identify the code snippet that semantically represents the query from the code candidates.
-
公开(公告)号:US11481552B2
公开(公告)日:2022-10-25
申请号:US17011939
申请日:2020-09-03
Applicant: salesforce.com, inc.
Inventor: Ben Krause , Akhilesh Deepak Gotmare
IPC: G06F40/274 , G06F40/284 , G10L15/183 , G06F40/216
Abstract: The embodiments describe a generative-discriminative (GeDi) language modeling for determining a next token in a text sequence. A class conditional language model and a positive control code determine a first class conditional probability for each token candidate. The class conditional language model and a negative control code determine a second class conditional probability for the each token candidate. A logarithmic probability difference between the first class conditional probability and the second class conditional probability is determined for each token candidate. An unconditional language model determines an unconditional probability for each token candidate. A combined probability is determined by combining the unconditional probability and the logarithmic probability difference for each token candidate. The next token is selected from the token candidates based on the combined probabilities of the token candidates.
-
-
-