-
公开(公告)号:US11087081B1
公开(公告)日:2021-08-10
申请号:US16359930
申请日:2019-03-20
Applicant: Amazon Technologies, Inc.
Inventor: Amulya Srivastava , Vivek Bhadauria , Gowtham Jeyabalan , Paul H. Kang , Mohammed El Hamalawi
IPC: G06F40/00 , G06F40/186 , G06N3/04 , G06N20/00 , G06F40/117 , G06F40/169
Abstract: A synthetic document generator that obtains a configuration for a synthetic document derived from real-world documents. The configuration specifies element templates to be included in the synthetic document and weights for the specified element templates. The system generates synthetic documents based on the configuration; the synthetic documents include diversified versions of the element templates specified in the configuration. Annotation documents are generated for the synthetic documents that include information describing the respective synthetic documents. A machine learning model for analyzing real-world documents can then be trained using the synthetic and annotation documents. Feedback from the analysis of real-world documents by the machine learning model can be used to generate a new configuration for generating additional synthetic and annotation documents which are used to further train the model.