Patent search ap:("Salesforce Page Inc.") AND inv:"Shruthan Radhakrishna"

1.

发明申请
AUTOMATED DATA EXTRACTION PIPELINE FOR LARGE LANGUAGE MODEL TRAINING 有权

公开(公告)号：US20250060944A1

公开(公告)日：2025-02-20

申请号：US18449498

申请日：2023-08-14

Applicant: Salesforce, Inc.

Inventor： Shruthan Radhakrishna , Hadi Minooei , Yazdan Jamshidi

IPC: G06F8/33 , G06F40/55

Abstract: An automated data extraction pipeline for large language model (LLM) training may include extracting a set of code segments from a set of natural language question-answer (Q&A) combinations that each include a provided input, a provided output, and a provided code segment formatted to transform the provided input into the provided output. The data extraction pipeline may then generate a predicted output from a question portion of a first natural language Q&A combination using a first LLM. A first extracted code segment from the extracted set of code segments may then be executed to generate a first actual output of the first extracted code segment. One or more data samples may then be generated for training a second LLM based on a comparison of the first actual output to the predicted output. The second LLM may then be trained using the one or more data samples.

Patent Agency Ranking