-
公开(公告)号:US20240256965A1
公开(公告)日:2024-08-01
申请号:US18424624
申请日:2024-01-26
Applicant: Google LLC
Inventor: Hyung Won Chung , Barret Zoph , Dengyong Zhou , Liam Fedus , Shayne Longpre , Le Hou , Yi Tay , Jason Weng Wei , Siddhartha Brahma , Quoc V. Le
IPC: G06N20/00
CPC classification number: G06N20/00
Abstract: An example method for training a machine-learned sequence processing model includes obtaining a plurality of training examples for training the machine-learned sequence processing model. For each respective training example of the plurality of training examples, the example method includes: obtaining a respective query associated with the respective training example; inputting the respective query to the machine-learned sequence processing model; obtaining, from the machine-learned sequence processing model a response to the respective query and a trace of intermediate states from the respective query to the response; evaluating the response using a ground truth response associated with the respective training example; evaluating the trace using a ground truth trace associated with the respective training example; and updating one or more parameters of the machine-learned sequence processing model based on the evaluation of the response and based on the evaluation of the trace.
-
公开(公告)号:US20240232637A9
公开(公告)日:2024-07-11
申请号:US18491877
申请日:2023-10-23
Applicant: Google LLC
Inventor: Krishna Pragash Srinivasan , Michael Bendersky , Anupam Samanta , Lingrui Liao , Luca Bertelli , Ming-Wei Chang , Iftekhar Naim , Siddhartha Brahma , Siamak Shakeri , Hongkun Yu , John Nham , Karthik Raman , Raphael Dominik Hoffmann
IPC: G06N3/0895 , G06F16/903 , G06F16/93 , G06N3/0455
CPC classification number: G06N3/0895 , G06F16/90335 , G06F16/93 , G06N3/0455
Abstract: Provided are computing systems, methods, and platforms that train query processing models, such as large language models, to perform query intent classification tasks by using retrieval augmentation and multi-stage distillation. Unlabeled training examples of queries may be obtained, and a set of the training examples may be augmented with additional feature annotations to generate augmented training examples. A first query processing model may annotate the retrieval augmented queries to generate inferred labels for the augmented training examples. A second query processing model may be trained on the inferred labels, distilling the query processing model that was trained with retrieval augmentation into a non-retrieval augmented query processing model. The second query processing model may annotate the entire set of unlabeled training examples. Another stage of distillation may train a third query processing model using the entire set of unlabeled training examples without retrieval augmentation.
-
公开(公告)号:US20240135187A1
公开(公告)日:2024-04-25
申请号:US18491877
申请日:2023-10-22
Applicant: Google LLC
Inventor: Krishna Pragash Srinivasan , Michael Bendersky , Anupam Samanta , Lingrui Liao , Luca Bertelli , Ming-Wei Chang , Iftekhar Naim , Siddhartha Brahma , Siamak Shakeri , Hongkun Yu , John Nham , Karthik Raman , Raphael Dominik Hoffmann
IPC: G06N3/0895 , G06F16/903 , G06F16/93 , G06N3/0455
CPC classification number: G06N3/0895 , G06F16/90335 , G06F16/93 , G06N3/0455
Abstract: Provided are computing systems, methods, and platforms that train query processing models, such as large language models, to perform query intent classification tasks by using retrieval augmentation and multi-stage distillation. Unlabeled training examples of queries may be obtained, and a set of the training examples may be augmented with additional feature annotations to generate augmented training examples. A first query processing model may annotate the retrieval augmented queries to generate inferred labels for the augmented training examples. A second query processing model may be trained on the inferred labels, distilling the query processing model that was trained with retrieval augmentation into a non-retrieval augmented query processing model. The second query processing model may annotate the entire set of unlabeled training examples. Another stage of distillation may train a third query processing model using the entire set of unlabeled training examples without retrieval augmentation.
-
-