-
公开(公告)号:US11257006B1
公开(公告)日:2022-02-22
申请号:US16196662
申请日:2018-11-20
Applicant: Amazon Technologies, Inc.
Inventor: Oron Anschel , Amit Adam , Shahar Tsiper , Hadar Averbuch Elor , Shai Mazor , Rahul Bhotika , Stefano Soatto
IPC: G06N20/00 , G06K9/00 , G06F40/169
Abstract: Techniques for auto-generation of annotated real-world training data are described. An electronic document is analyzed to determine text represented in the document and corresponding locations of the text. A representation of the electronic document is modified to include markers and printed. The printed document is photographed in real-world environments, and the markers within the digital photographs are analyzed to allow for the depiction of the document within the photographs to be rectified. The text and location data are used to annotate the rectified images.
-
公开(公告)号:US11308354B1
公开(公告)日:2022-04-19
申请号:US16834997
申请日:2020-03-30
Applicant: Amazon Technologies, Inc.
Inventor: Ron Litman , Oron Anschel , Shahar Tsiper , Roee Litman , Shai Mazor , Jonathan Wu , Raghavan Manmatha
Abstract: Techniques for recognizing text in an image are described. An exemplary method may include receiving a request to recognize text in an image; extracting features from the image and generating a visual feature sequence from the extracted features; performing selective contextual refinement at least one selective contextual refinement block of a stack of selective contextual refinement blocks to generate a text prediction by: generating a contextual feature map and combining the contextual feature map with the visual feature sequence into a visual feature space, and applying a selective decoder that utilizes a two-step attention on the visual feature space to generate a text prediction, wherein the two-step attention includes performing a 1-D self-attention computation to generate attentional features and decoding the attentional features to generate the text prediction; and outputting the generated text prediction.
-
公开(公告)号:US11341605B1
公开(公告)日:2022-05-24
申请号:US16588503
申请日:2019-09-30
Applicant: Amazon Technologies, Inc.
Inventor: Kunwar Yashraj Singh , Amit Adam , Shahar Tsiper , Gal Sabina Star , Roee Litman , Hadar Averbuch Elor , Vijay Mahadevan , Rahul Bhotika , Shai Mazor , Mohammed El Hamalawi
Abstract: Techniques for document rectification via homography recovery using machine learning are described. An image rectification system can intelligently make use of multiple pipelines for rectifying document images based on the detected type of device that generated the images. The image rectification system can provide high-quality rectifications without requiring human cooperation, multiple views of the document in multiple images, and/or without being constrained to only be able to process images from one source context.
-
4.
公开(公告)号:US10970530B1
公开(公告)日:2021-04-06
申请号:US16189633
申请日:2018-11-13
Applicant: Amazon Technologies, Inc.
Inventor: Amit Adam , Oron Anschel , Or Perel , Gal Sabina Star , Omri Ben-Eliezer , Hadar Averbuch Elor , Shai Mazor , Wendy Tse , Andrea Olgiati , Rahul Bhotika , Stefano Soatto
IPC: G06K9/62 , G06F40/137 , G06F40/169 , G06F40/174 , G06K9/00 , G06N20/00
Abstract: Techniques for grammar-based automated generation of annotated synthetic form training data for machine learning are described. A training data generation engine utilizes a defined grammar to construct a layout for a form, select key-value units to place within the layout, and select attribute variants for the key-value units. The form is rendered and stored at a storage location, where it can be provided along with other similarly-generated forms to be used as training data for a machine learning model.
-
公开(公告)号:US10949661B2
公开(公告)日:2021-03-16
申请号:US16198040
申请日:2018-11-21
Applicant: Amazon Technologies, Inc.
Inventor: Rahul Bhotika , Shai Mazor , Amit Adam , Wendy Tse , Andrea Olgiati , Bhavesh Doshi , Gururaj Kosuru , Patrick Ian Wilson , Umar Farooq , Anand Dhandhania
IPC: G06K9/00
Abstract: Techniques for layout-agnostic complex document processing are described. A document processing service can analyze documents that do not adhere to defined layout rules in an automated manner to determine the content and meaning of a variety of types of segments within the documents. The service may chunk a document into multiple chunks, and operate upon the chunks in parallel by identifying segments within each chunk, classifying the segments into segment types, and processing the segments using special-purpose analysis engines adapted for the analysis of particular segment types to generate results that can be aggregated into an overall output for the entire document that captures the meaning and context of the document text.
-
-
-
-