-
公开(公告)号:US20210248153A1
公开(公告)日:2021-08-12
申请号:US16785329
申请日:2020-02-07
IPC分类号: G06F16/25 , G06F16/242 , G06F16/21 , G06F16/34 , G06F16/332 , G06F16/248
摘要: Aspects of the present disclosure describe techniques for generating a machine learning model for extracting information from textual content. The method generally includes receiving a training data set including a plurality of documents having related textual strings. A relevancy model is generated from the training data set. The relevancy model is generally configured to generate relevance scores for a plurality of words extracted from the plurality of documents. A knowledge graph model illustrating relationships between the plurality of words extracted from the plurality of documents is generated from the training data set. The relevancy model and the knowledge graph model are aggregated into a complimentary model including a plurality of nodes from the knowledge graph model and weights associated with edges between connected nodes, wherein the weights comprise relevance scores generated from the relevancy model, and the complimentary model is deployed for use in analyzing documents.
-
公开(公告)号:US20230154220A1
公开(公告)日:2023-05-18
申请号:US18154665
申请日:2023-01-13
发明人: Scott CARRIER , Ritwik RAY , Jonathan Chapin RAND , Jothilakshmi SIRANGIMOORTHY , Hui WANG , Robert FREDENBURG
IPC分类号: G06V30/412 , G06F3/0482 , G06F40/237 , G06F40/40 , G06V30/416
CPC分类号: G06V30/412 , G06F3/0482 , G06F40/237 , G06F40/40 , G06V30/416
摘要: Provided are a computer program product, system, and method for pre-processing a table in a document for natural language processing. A table in a document is parsed to extract column headers, row headers, and data cells, which are processed to determine an initial set of a main element comprising an entity whose value is to be extracted, a conditional element that refines the entity, and a value element comprising a value for the entity. A user selection is received of at least one of the column headers, row headers, and data cells for at least one of the main element, conditional element, and the value element in the initial set to produce a modified set of the main element, conditional element, and value element. The modified set is provided to a natural language processing engine to perform natural language processing of the document including the table, using the modified set.
-
公开(公告)号:US20220230012A1
公开(公告)日:2022-07-21
申请号:US17155077
申请日:2021-01-21
发明人: Scott CARRIER , Ritwik RAY , Jonathan Chapin RAND , Jothilakshmi SIRANGIMOORTHY , Hui WANG , Robert FREDENBURG
IPC分类号: G06K9/00 , G06F40/40 , G06F40/237 , G06F3/0482
摘要: Provided are a computer program product, system, and method for pre-processing a table in a document for natural language processing. A table in a document is parsed to extract column headers, row headers, and data cells, which are processed to determine an initial set of a main element comprising an entity whose value is to be extracted, a conditional element that refines the entity, and a value element comprising a value for the entity. A user selection is received of at least one of the column headers, row headers, and data cells for at least one of the main element, conditional element, and the value element in the initial set to produce a modified set of the main element, conditional element, and value element. The modified set is provided to a natural language processing engine to perform natural language processing of the document including the table, using the modified set.
-
公开(公告)号:US20240096124A1
公开(公告)日:2024-03-21
申请号:US18518279
申请日:2023-11-22
发明人: Scott CARRIER , Ritwik RAY , Jonathan Chapin RAND , Jothilakshmi SIRANGIMOORTHY , Hui WANG , Robert FREDENBURG
IPC分类号: G06V30/412 , G06F3/0482 , G06F40/237 , G06F40/40 , G06V30/416
CPC分类号: G06V30/412 , G06F3/0482 , G06F40/237 , G06F40/40 , G06V30/416
摘要: Provided are a computer program product, system, and method for pre-processing a table in a document for natural language processing (NLP). A graphical user interface (GUI) provides a representation of table items in a table in a document including a set of a main element comprising an entity whose value is to be extracted, a conditional element that refines the entity, and a value element comprising a value for the entity. Graphical controls are rendered in the GUI to enable a user to select an element from the table to be the main element, conditional element, and value element. The set of the main element, conditional element, and value element are updated with the user selected element to form a modified set. The modified set of the main element, conditional element, and the value element are provided to an NLP engine to perform natural language processing.
-
公开(公告)号:US20210248303A1
公开(公告)日:2021-08-12
申请号:US16785374
申请日:2020-02-07
IPC分类号: G06F40/106 , G06F40/166 , G06F40/197
摘要: Aspects of the present disclosure describe techniques for generating a machine learning model for extracting information from textual content. The method generally includes receiving an unstructured document and a structured document including information extracted from the unstructured document and position information associated with the extracted information. The unstructured document is rendered in a first pane, and a graphical rendering of the structured document is rendered in a second pane. The graphical rendering generally may be a structure in which content from the structured document is displayed in a hierarchical format. Each element in the structured document is linked to the rendered unstructured document based on position information included in the structured document.
-
-
-
-