Rights mapping system and method
    3.
    发明授权

    公开(公告)号:US12230048B1

    公开(公告)日:2025-02-18

    申请号:US18137590

    申请日:2023-04-21

    Abstract: A method and system can include processing title and title opinion document images to generate text information. Trained models may generate data objects representative of period of time during which certain rights to a property exist. The trained models may also generate rules for modifying the data objects and interrelating the data objects to each other. In some examples, a confidence level can be generated and will reflect a likelihood of a data object including correct information. The modified and interrelated data objects may be used to generate a navigable interface which includes a current title status for a property and a navigable chain of title reflecting historical rights to the property.

    ANNOTATION ALIGNMENT FOR CHARACTER RECOGNITION IN DOCUMENTS

    公开(公告)号:US20250054325A1

    公开(公告)日:2025-02-13

    申请号:US18231652

    申请日:2023-08-08

    Applicant: SAP SE

    Abstract: Systems and processes for aligning weakly-annotated data to recognized characters in a document are provided. In a method for aligning annotation data to recognized characters, annotation words and character recognition tokens are received, and a search algorithm is performed to align the annotation words to the tokens in a stepwise manner. At each step, an annotation word is aligned to one or more tokens, and a cost of each respective alignment is calculated. Once all annotation words are aligned, a full set of annotation word-token pairs corresponding to the annotation is selected based on a total cost of alignment for that set. A bounding box enclosing the tokens in the selected full set is generated and output to a target application.

    METHODS AND APPARATUS FOR EXTRACTING DATA FROM A DOCUMENT BY ENCODING IT WITH TEXTUAL AND VISUAL FEATURES AND USING MACHINE LEARNING

    公开(公告)号:US20250005952A1

    公开(公告)日:2025-01-02

    申请号:US18759395

    申请日:2024-06-28

    Abstract: An apparatus including a processor caused to receive document images, each including representations of characters. The processor is caused to parse each document image to extract, based on structure type, subsets of characters, to generate a text encoding for that document image. For each document, the processor is caused to extract visual features to generate a visual encoding for that document image, each visual feature associated with a subset of characters. The processor is caused to generate parsed documents, each parsed document uniquely associated with a document image and based on the text and visual encoding for that document image. For each parsed document, the processor is caused to identify sections uniquely associated with section type. The processor is caused to train machine learning models, each machine learning model associated with one section type and trained using a portion of each parsed document associated with that section type.

    Methods and apparatus for extracting data from a document by encoding it with textual and visual features and using machine learning

    公开(公告)号:US12183106B1

    公开(公告)日:2024-12-31

    申请号:US18759395

    申请日:2024-06-28

    Abstract: An apparatus including a processor caused to receive document images, each including representations of characters. The processor is caused to parse each document image to extract, based on structure type, subsets of characters, to generate a text encoding for that document image. For each document, the processor is caused to extract visual features to generate a visual encoding for that document image, each visual feature associated with a subset of characters. The processor is caused to generate parsed documents, each parsed document uniquely associated with a document image and based on the text and visual encoding for that document image. For each parsed document, the processor is caused to identify sections uniquely associated with section type. The processor is caused to train machine learning models, each machine learning model associated with one section type and trained using a portion of each parsed document associated with that section type.

    Removal of sensitive data from documents for use as training sets

    公开(公告)号:US12182308B2

    公开(公告)日:2024-12-31

    申请号:US17309198

    申请日:2019-11-07

    Abstract: Systems and methods relating to the replacement or removal of sensitive data in images of documents. An initial image of a document with sensitive data is received at an execution module and changes are made based on the execution module's training. The changes include replacing or effectively removing the sensitive data from the image of the document. The resulting sanitized image is then sent to a user for validation of the changes. The feedback from the user is then used in training the execution module to refine its behaviour when applying changes to other initial images of documents. To train the execution module, training data sets of document images with sensitive data manually tagged by users are used. The execution module thus learns to identify sensitive data and its submodules replace that sensitive data with suitable replacement data. The feedback from the user works to improve the resulting sanitized images from the execution module.

Patent Agency Ranking