CONTENT EXTRACTION BASED ON HOP DISTANCE WITHIN A GRAPH MODEL

    公开(公告)号:US20240153296A1

    公开(公告)日:2024-05-09

    申请号:US17983908

    申请日:2022-11-09

    申请人: Paypal, Inc.

    摘要: A method of categorizing text entries on a document can include determining, for each of a plurality of text bounding boxes in the document, respective text, respective coordinates, and respective input embeddings. The method may further include defining a graph of the plurality of bounding boxes, the graph comprising a plurality of connections among the plurality of bounding boxes, each connection comprising a first and second bounding box and zero or more respective intermediate bounding boxes. The method may further include determining a respective attention value for each connection according to a quantity of intermediate bounding boxes in the connection and, based on a the respective attention values and a transformer-based machine learning model applied to the respective input embeddings and respective coordinates, determining output embeddings for each bounding box and, based on the respective output embeddings, generating a bounding box label for each bounding box.

    TEXT BLOCK SEGMENTATION
    4.
    发明公开

    公开(公告)号:US20240046677A1

    公开(公告)日:2024-02-08

    申请号:US17814856

    申请日:2022-07-26

    IPC分类号: G06V30/148 G06V30/18

    CPC分类号: G06V30/153 G06V30/18181

    摘要: A computer-implemented method for text block segmentation includes determining a first text block segmentation pattern utilized to generate a segmented text block based, at least in part, on a comparison of semantic information associated with the segmented text block and a plurality of predefined types of text block segmentation patterns indicated by a graph; calculating a first degree of confidence in a size of the segmented text block based, at least in part, on comparing semantic entities associated with the segmented text block with semantic entities indicated by leaf nodes stemming from a first non-leaf node included in the graph and representative of the first type of text block segmentation pattern; and determining that the size of the segmented text block is non-optimal based on the calculated degree of confidence in the size of the segmented text block being below a predetermined threshold.

    DIGITAL FORENSIC APPARATUS FOR SEARCHING RECOVERY TARGET AREA FOR LARGE-CAPACITY VIDEO EVIDENCE USING TIME MAP AND METHOD OF OPERATING THE SAME

    公开(公告)号:US20230343096A1

    公开(公告)日:2023-10-26

    申请号:US17975897

    申请日:2022-10-28

    申请人: GMDSOFT Inc.

    摘要: The present disclosure relates to technology for automatically searching and recovering the recovery area of frames corresponding to a desired time for large-capacity video evidence using a time map generated through an optical character recognition (OCR) function. A digital forensic apparatus for searching and recovering a recovery target area for large-capacity video evidence using a time map according to an embodiment of the present disclosure may include a division recovery device for collecting video evidence from a storage device, dividing the collected video evidence into a plurality of spaces in consideration of the physical space of the storage device, and recovering a representative frame in each of the divided spaces; a time information recognizer for recognizing time information from the recovered representative frame using an optical character recognition (OCR) function; a time map generator for generating a time map in which the divided spaces are arranged according to a time criterion based on the recognized time information; and a selective recovery device for searching a recovery target area by matching specific time information input by a user with the generated time map and recovering the searched recovery target area.

    SYSTEM AND METHOD FOR PROCESSING DOCUMENTS FOR ENHANCED SEARCH

    公开(公告)号:US20240290122A1

    公开(公告)日:2024-08-29

    申请号:US18175077

    申请日:2023-02-27

    申请人: Innoplexus AG

    摘要: A method for processing documents for enhanced search includes identifying a set of bounding boxes in the document. The method further includes defining one or more pairs of bounding boxes in the document. Each pair of bounding boxes is defined by a binary relation. The method further includes constructing a directed acyclic graph (DAG) from the one or more pairs of bounding boxes. The method further includes determining a topological sorting of each bounding box in the document based on the DAG. The topological sorting defines an adjacency relationship between the bounding boxes in the document. The method further includes extracting key-value pairs from the document based on the adjacency relationship between the bounding boxes in the document. The method further includes storing the key-value pairs in a key-value pair database.

    Graph machine learning for case similarity

    公开(公告)号:US12050522B2

    公开(公告)日:2024-07-30

    申请号:US17577711

    申请日:2022-01-18

    IPC分类号: G06F11/14 G06N3/04 G06V30/18

    摘要: Herein is machine learning for anomalous graph detection based on graph embedding, shuffling, comparison, and unsupervised training techniques that can characterize an unfamiliar graph. In an embodiment, a computer obtains many known vectors that respectively represent known graphs. A new vector is generated that represents a new graph that contains multiple vertices. The new vector may contain an arithmetic aggregation of vertex vectors that respectively represent multiple vertices and/or a vector that represents a virtual vertex that is connected to the multiple vertices by respective virtual edges. In the many known vectors, some similar vectors that are similar to the new vector are identified. The new graph is automatically characterized based on a subset of the known graphs that the similar vectors represent.

    GRAPH MACHINE LEARNING FOR CASE SIMILARITY
    9.
    发明公开

    公开(公告)号:US20230229570A1

    公开(公告)日:2023-07-20

    申请号:US17577711

    申请日:2022-01-18

    IPC分类号: G06F11/14 G06V30/18 G06N3/04

    摘要: Herein is machine learning for anomalous graph detection based on graph embedding, shuffling, comparison, and unsupervised training techniques that can characterize an unfamiliar graph. In an embodiment, a computer obtains many known vectors that respectively represent known graphs. A new vector is generated that represents a new graph that contains multiple vertices. The new vector may contain an arithmetic aggregation of vertex vectors that respectively represent multiple vertices and/or a vector that represents a virtual vertex that is connected to the multiple vertices by respective virtual edges. In the many known vectors, some similar vectors that are similar to the new vector are identified. The new graph is automatically characterized based on a subset of the known graphs that the similar vectors represent.

    GRAPH MACHINE LEARNING FOR CASE SIMILARITY
    10.
    发明公开

    公开(公告)号:US20240330130A1

    公开(公告)日:2024-10-03

    申请号:US18740689

    申请日:2024-06-12

    IPC分类号: G06F11/14 G06N3/04 G06V30/18

    摘要: Herein is machine learning for anomalous graph detection based on graph embedding, shuffling, comparison, and unsupervised training techniques that can characterize an unfamiliar graph. In an embodiment, a computer obtains many known vectors that respectively represent known graphs. A new vector is generated that represents a new graph that contains multiple vertices. The new vector may contain an arithmetic aggregation of vertex vectors that respectively represent multiple vertices and/or a vector that represents a virtual vertex that is connected to the multiple vertices by respective virtual edges. In the many known vectors, some similar vectors that are similar to the new vector are identified. The new graph is automatically characterized based on a subset of the known graphs that the similar vectors represent.