-
公开(公告)号:US11507593B2
公开(公告)日:2022-11-22
申请号:US17078005
申请日:2020-10-22
Inventor: Manish Shrivastava , Vishnu Ramesh
IPC: G06F16/25 , G06F16/2457 , G06N20/00
Abstract: A system for generating a queryable structured document from an unstructured document using a machine learning model is provided. The system (i) identifies breakpoints in the unstructured document, (ii) segments the unstructured document into one or more fragments based on identified breakpoints, (iii) classifies the one or more fragments as one or more title fragments or one or more non-title fragments based on a sequence of a position of words used in each fragment of the one or more fragments, (iv) constructs a data tree using the one or more title fragments and the one or more non-title fragments as a node of the data tree; (v) assigns one or more vectors to each node of the data tree, and (vi) generates a structured document by providing matrix representation for each node of the data tree.
-
公开(公告)号:US20210117438A1
公开(公告)日:2021-04-22
申请号:US17078005
申请日:2020-10-22
Inventor: Manish Shrivastava , Vishnu Ramesh
IPC: G06F16/25 , G06N20/00 , G06F16/2457
Abstract: A system for generating a queryable structured document from an unstructured document using a machine learning model is provided. The system (i) identifies breakpoints in the unstructured document, (ii) segments the unstructured document into one or more fragments based on identified breakpoints, (iii) classifies the one or more fragments as one or more title fragments or one or more non-title fragments based on a sequence of a position of words used in each fragment of the one or more fragments, (iv) constructs a data tree using the one or more title fragments and the one or more non-title fragments as a node of the data tree; (v) assigns one or more vectors to each node of the data tree, and (vi) generates a structured document by providing matrix representation for each node of the data tree.
-