-
公开(公告)号:US20250068847A1
公开(公告)日:2025-02-27
申请号:US18453236
申请日:2023-08-21
Applicant: Google LLC
Inventor: Vincent Perot , Florian Luisier , Kai Kang , Ramya Sree Boppana , Jiaqi Mu , Xiaoyu Sun , Carl Elie Saroufim , Guolong Su , Hao Zhang , Nikolay Alexeevich Glushnev , Nan Hua , Yun-Hsuan Sung , Michael Yiupun Kwong
IPC: G06F40/295 , G06V30/19
Abstract: Systems and methods for performing document entity extraction are described herein. The method can include receiving an inference document and a target schema. The method can also include generating one or more document inputs from the inference document and one or more schema inputs from the target schema. The method can further include, for each combination of the document input and schema input, obtaining one or more extraction inputs by generating a respective extraction input based on the combination, providing the respective extraction input to the machine-learned model, and receiving a respective output of the machine-learned model based on the respective extraction. The method can also include validating the extracted entity data based on reference spatial locations and inference spatial locations and outputting the validated extracted entity data.
-
公开(公告)号:US20250045316A1
公开(公告)日:2025-02-06
申请号:US18788178
申请日:2024-07-30
Applicant: Google LLC
Inventor: Jinhyuk Lee , Zhuyun Dai , Xiaoqi Ren , Iftekhar Naim , Yi Luan , Blair Yuxin Chen , Siddhartha Reddy Jonnalagadda , Ming-Wei Chang , Daniel Matthew Cer , Gustavo Adolfo Hernandez Abrego , Jeremy Robert Cole , Colin Hearne Evans , Yuzhe Zhao , Pranay Bhatia , Rajvi Kapadia , Riham Hassan Abdel-Moneim Mansour , Raphael Dominik Hoffman , Simon Kunio Tokumine , Scott Bradley Huffman , Stephen Zachary Karukas , Michael Yiupun Kwong , Shu Zheng , Yan Qiao , Lukas Rutishauser , Anand Rajan Iyer
Abstract: An example method includes providing, to a sequence model (i) a plurality of few-shot prompts, wherein each prompt comprises a demonstration passage, a demonstration task, and a demonstration query, wherein the demonstration task describes a type of retrieval, and wherein the demonstration query is relevant to the demonstration task, and (ii) a plurality of passages sampled from a corpus of passages. The method also includes receiving, from the sequence model and for the plurality of passages and based on the plurality of few-shot prompts, a respective plurality of predicted task-query pairs, the sequence model having been prompted to predict a task based on an input passage, and predict an output query relevant to the predicted task. The method further includes generating a synthetic training dataset comprising the plurality of passages and the respective plurality of predicted task-query pairs. The method also includes providing the synthetic training dataset.
-