-
公开(公告)号:US20210110153A1
公开(公告)日:2021-04-15
申请号:US16597318
申请日:2019-10-09
Applicant: Adobe Inc.
Inventor: Mohit Gupta , Uttam Dwivedi , Shawn Alan Gaither , Jayant Vaibhav Srivastava , Ashutosh Mehra
Abstract: Techniques described herein implement heading identification and classification for a digital document in a digital medium environment. A document analysis system is leveraged to extract structural features from a digital document, identify heading candidates from among the structural features, validate the headings candidates, and classify validated headings into different headings types. The classified headings are then utilized to generate a sectioned version of the digital document (“sectioned document”) that is divided into different sections based on the headings. Further, a document directory is generated that includes the headings and that enables navigation to different sections of the sectioned document.
-
公开(公告)号:US10956731B1
公开(公告)日:2021-03-23
申请号:US16597318
申请日:2019-10-09
Applicant: Adobe Inc.
Inventor: Mohit Gupta , Uttam Dwivedi , Shawn Alan Gaither , Jayant Vaibhav Srivastava , Ashutosh Mehra
IPC: G06K9/00 , G06F40/258 , G06F3/0482 , G06F40/106 , G06K9/62 , G06F16/93 , G06F40/197
Abstract: Techniques described herein implement heading identification and classification for a digital document in a digital medium environment. A document analysis system is leveraged to extract structural features from a digital document, identify heading candidates from among the structural features, validate the headings candidates, and classify validated headings into different headings types. The classified headings are then utilized to generate a sectioned version of the digital document (“sectioned document”) that is divided into different sections based on the headings. Further, a document directory is generated that includes the headings and that enables navigation to different sections of the sectioned document.
-
公开(公告)号:US10949604B1
公开(公告)日:2021-03-16
申请号:US16664335
申请日:2019-10-25
Applicant: Adobe Inc.
Inventor: Uttam Dwivedi , Mohit Gupta , Ashutosh Mehra
IPC: G06F17/00 , G06F40/114 , G06K9/00
Abstract: Techniques described herein implement identifying artifacts in digital documents in a digital medium environment. A document analysis system is leveraged to extract page features from a digital document and to determine whether certain page features represent page artifacts such as headers and footers. Those page features determined to be page artifacts can be extracted from the digital document to generate a reflowed version of the digital document that preserves primary content. The primary content, for instance, is rearranged in the reflowed document to compensate for the extracted page artifacts.
-
4.
公开(公告)号:US20230094787A1
公开(公告)日:2023-03-30
申请号:US17490610
申请日:2021-09-30
Applicant: Adobe Inc.
Inventor: Ankit Bal , Mohit Gupta , Ram Bhushan Agrawal , Tarun Verma , Uttam Dwivedi
Abstract: The present disclosure relates to systems, methods, and non-transitory computer readable media for accurately enhancing optical character recognition with a machine learning approach for determining words from reverse text, vertical text, and atypically-sized text. For example, the disclosed systems segment a digital image into text regions and non-text regions utilizing an object detection machine learning model. Within the text regions, the disclosed systems can determine reverse text glyphs, vertical text glyphs, and/or atypically-sized text glyphs utilizing an edge based adaptive binarization model. Additionally, the disclosed systems can utilize respective modification techniques to manipulate reverse text glyphs, vertical text glyphs, and/or atypically-sized glyphs for analysis by an optical character recognition model. The disclosed systems can further utilize an optical character recognition model to determine words from the modified versions of the reverse text glyphs, the vertical text glyphs, and/or the atypically-sized text glyphs.
-
-
-