-
公开(公告)号:US20140317113A1
公开(公告)日:2014-10-23
申请号:US13867776
申请日:2013-04-22
Applicant: ABB RESEARCH LTD.
Inventor: David Neil Cox
IPC: G06F17/30
CPC classification number: G06F17/30598 , G06F17/246 , G06F17/30563
Abstract: One or more techniques and/or systems are provided for parsing tabular data of a document. That is, a document may comprise arbitrarily formatted content (e.g., an equipment inspection report generated by an engineer). Respective rows of the document may be clustered into one or more row clusters based upon row proximity and/or numeric content (e.g., rows having similar numeric content may comprise logically related information). One or more vertical clusters may be generated within respective row clusters based upon vertical overlap. In this way, row clusters and/or vertical clusters may be searched for one or more values that may be assigned to a search term. For example, a row cluster may comprise a search term “Average temp”. One or more vertical clusters within the row cluster may be searched for a word that matches a pattern criteria (e.g., a two digit number), which may be assigned to the search term.
Abstract translation: 提供一个或多个技术和/或系统用于解析文档的表格数据。 也就是说,文档可以包括任意格式的内容(例如,由工程师生成的设备检查报告)。 基于行邻近度和/或数字内容(例如,具有相似数字内容的行可以包括逻辑相关信息),文档的相应行可以被聚集成一个或多个行集群。 可以基于垂直重叠在相应的行集群内生成一个或多个垂直集群。 以这种方式,可以搜索行簇和/或垂直簇,以便可以分配给搜索项的一个或多个值。 例如,行集合可以包括搜索项“平均温度”。 可以搜索行群集内的一个或多个垂直群集以匹配可被分配给搜索项的模式标准(例如,两位数字)的单词。
-
公开(公告)号:US09898523B2
公开(公告)日:2018-02-20
申请号:US13867776
申请日:2013-04-22
Applicant: ABB Research Ltd.
Inventor: David Neil Cox
CPC classification number: G06F17/30598 , G06F17/246 , G06F17/30563
Abstract: One or more techniques and/or systems are provided for parsing tabular data of a document. That is, a document may comprise arbitrarily formatted content (e.g., an equipment inspection report generated by an engineer). Respective rows of the document may be clustered into one or more row clusters based upon row proximity and/or numeric content (e.g., rows having similar numeric content may comprise logically related information). One or more vertical clusters may be generated within respective row clusters based upon vertical overlap. In this way, row clusters and/or vertical clusters may be searched for one or more values that may be assigned to a search term. For example, a row cluster may comprise a search term “Average temp”. One or more vertical clusters within the row cluster may be searched for a word that matches a pattern criteria (e.g., a two digit number), which may be assigned to the search term.
-