Identifying artifacts in digital documents

    公开(公告)号:US10949604B1

    公开(公告)日:2021-03-16

    申请号:US16664335

    申请日:2019-10-25

    Applicant: Adobe Inc.

    Abstract: Techniques described herein implement identifying artifacts in digital documents in a digital medium environment. A document analysis system is leveraged to extract page features from a digital document and to determine whether certain page features represent page artifacts such as headers and footers. Those page features determined to be page artifacts can be extracted from the digital document to generate a reflowed version of the digital document that preserves primary content. The primary content, for instance, is rearranged in the reflowed document to compensate for the extracted page artifacts.

    UTILIZING MACHINE-LEARNING BASED OBJECT DETECTION TO IMPROVE OPTICAL CHARACTER RECOGNITION

    公开(公告)号:US20230094787A1

    公开(公告)日:2023-03-30

    申请号:US17490610

    申请日:2021-09-30

    Applicant: Adobe Inc.

    Abstract: The present disclosure relates to systems, methods, and non-transitory computer readable media for accurately enhancing optical character recognition with a machine learning approach for determining words from reverse text, vertical text, and atypically-sized text. For example, the disclosed systems segment a digital image into text regions and non-text regions utilizing an object detection machine learning model. Within the text regions, the disclosed systems can determine reverse text glyphs, vertical text glyphs, and/or atypically-sized text glyphs utilizing an edge based adaptive binarization model. Additionally, the disclosed systems can utilize respective modification techniques to manipulate reverse text glyphs, vertical text glyphs, and/or atypically-sized glyphs for analysis by an optical character recognition model. The disclosed systems can further utilize an optical character recognition model to determine words from the modified versions of the reverse text glyphs, the vertical text glyphs, and/or the atypically-sized text glyphs.

Patent Agency Ranking