DEEP LEARNING TECHNIQUES FOR EXTRACTION OF EMBEDDED DATA FROM DOCUMENTS

    公开(公告)号:US20230139397A1

    公开(公告)日:2023-05-04

    申请号:US17819445

    申请日:2022-08-12

    Abstract: Deep learning techniques are disclosed for extraction of embedded data from documents. In an exemplary technique, a set of unstructured text data is received. One or more text groupings are generated by processing the set of unstructured text data. One or more text grouping embeddings are generated in a format for input to a machine learning model based on the one or more generated text groupings. One or more output predictions are generated by inputting the one or more text grouping embeddings into the machine learning model. Each output prediction of the one or more output predictions correspond to a predicted aspect of a text grouping of the one or more text groupings.

    GENERATING AN ELECTRONIC DOCUMENT WITH A CONSISTENT TEXT ORDERING

    公开(公告)号:US20240061989A1

    公开(公告)日:2024-02-22

    申请号:US18169740

    申请日:2023-02-15

    CPC classification number: G06F40/103 G06F40/205 G06F40/284 G06F40/30

    Abstract: Techniques for generating text content arranged in a consistent read order from a source document including text corresponding to different read orders are disclosed. A system parses a binary file representing an electronic document to identify characters and metadata associated with the characters. The system pre-sorts a character order of characters in each line of the electronic document to generate an ordered list of characters arranged according to the right-to-left reading order. The system performs a layout-mirroring operation to change a position of characters within the modified document relative to a right edge of the document and a left edge of the document. Subsequent to performing layout-mirroring, the system identifies native left-to-right reading-order text in-line with the native right-to-left reading-order text. The system flips the reading order of the native left-to-right read-order characters into the left-to-right reading order to be consistent with the native right-to-left read-order text.

Patent Agency Ranking