Knowledge aided feature engineering

    公开(公告)号:US11599826B2

    公开(公告)日:2023-03-07

    申请号:US16741084

    申请日:2020-01-13

    IPC分类号: G06N20/00 G06F11/34

    摘要: Embodiments relate to a system, program product, and method for employing feature engineering to improve classifier performance. A first machine learning (ML) model with a first learning program is selected. The first selected ML model is operatively associated with a first structured dataset. First features in the first dataset directed at performance of the selected ML model are identified. A second structured dataset is assessed with respect to the identified features in the first dataset, and new features in the second dataset are identified, where the new features are semantically related to the identified features in the first dataset. The first dataset is dynamically augmented with the identified new features in the second dataset. The dynamically augmented first dataset is applied to the selected ML model to subject an embedded learning algorithm of the selected ML model to training using the augmented first dataset.

    Method and Apparatus for Identifying the Optimal Schema to Store Graph Data in a Relational Store
    7.
    发明申请
    Method and Apparatus for Identifying the Optimal Schema to Store Graph Data in a Relational Store 审中-公开
    用于识别最佳模式以在关系存储中存储图形数据的方法和装置

    公开(公告)号:US20160203236A1

    公开(公告)日:2016-07-14

    申请号:US15078931

    申请日:2016-03-23

    IPC分类号: G06F17/30

    CPC分类号: G06F16/9024 G06F16/211

    摘要: A system for identifying a schema for storing graph data includes a database containing a graph dataset of data and relationships between data pairs and a list of storage methods that each are a distinct structural arrangement of the data and relationships from the graph data set. An analyzer module collects statistics for the graph dataset, and a data classification module uses the collected statistics to calculate metrics describing the data and relationships in the graph dataset, uses the calculated metrics to group the data and relationships into a plurality of graph dataset subsets and associates each graph dataset subset with one of the plurality of storage methods. The resulting group of storage methods associated with the plurality of graph dataset subsets includes a unique storage method for each graph dataset subset. The data and relationships in each graph dataset subset are arranged in accordance with associated storage methods.

    摘要翻译: 用于识别用于存储图形数据的模式的系统包括数据库,该数据库包含数据的图形数据集和数据对之间的关​​系以及存储方法的列表,每个存储方法是与图形数据集的数据和关系的不同结构布置。 分析器模块收集图形数据集的统计信息,数据分类模块使用收集的统计信息来计算描述图形数据集中的数据和关系的度量,使用计算的度量将数据和关系分组为多个图形数据集子集,以及 将每个图形数据集子集与多个存储方法之一相关联。 与多个图形数据集子集相关联的所得到的存储方法组包括用于每个图形数据集子集的唯一存储方法。 每个图形数据集子集中的数据和关系按照相关的存储方法进行排列。

    Method and Apparatus for Storing Sparse Graph Data as Multi-Dimensional Cluster
    8.
    发明申请
    Method and Apparatus for Storing Sparse Graph Data as Multi-Dimensional Cluster 有权
    将稀疏图数据存储为多维集群的方法和装置

    公开(公告)号:US20150052134A1

    公开(公告)日:2015-02-19

    申请号:US13967261

    申请日:2013-08-14

    IPC分类号: G06F17/30

    摘要: A system for storing graph data as a multi-dimensional cluster having a database with a graph dataset containing data and relationships between data pairs and a schema list of storage methods that use a table with columns and rows associated with data or relationships. An analyzer module to collect statistics of a graph dataset and a dimension identification module to identify a plurality of dimensions that each represent a column in the table. A schema creation and loading module creates a modified storage method and having a plurality of distinct table blocks and a plurality of table block indexes, one index for each table block and arranges the data and relationships in the given graph dataset in accordance with the modified storage method to create the multi-dimensional cluster.

    摘要翻译: 用于将图形数据存储为具有数据库的图形数据的系统,该数据库具有包含数据和数据对之间的关​​系的图形数据集,以及使用具有与数据或关系相关联的列和行的表的存储方法的模式列表。 分析器模块,用于收集图形数据集和维度识别模块的统计信息,以识别每个表示表中的列的多个维度。 模式创建和加载模块创建经修改的存储方法并且具有多个不同的表块和多个表块索引,每个表块的一个索引,并且根据修改的存储器将数据和关系布置在给定图形数据集中 方法来创建多维集群。

    Scalable Ontology Extraction
    9.
    发明申请
    Scalable Ontology Extraction 失效
    可扩展本体提取

    公开(公告)号:US20130024406A1

    公开(公告)日:2013-01-24

    申请号:US13625931

    申请日:2012-09-25

    IPC分类号: G06F15/18

    CPC分类号: G06N5/025 G06F19/00

    摘要: Techniques for facilitating learning of one or more ontological rules of a resource description framework database are provided. The techniques include obtaining ontology vocabulary from a resource description framework database, generating a rule hypothesis by incrementally building upon a previously learnt rule from the database by adding one or more predicates to the previously learnt rule, performing a constraint check on the generated rule hypothesis by determining compatibility with each previously learnt rule to ensure that a complete rule set including each previously learnt rule and the generated rule hypothesis is consistent, validating the rule hypothesis as a rule using one or more association rule mining techniques to determine validity of the rule hypothesis against the database, and applying the rule to the database to infer one or more facts from the database to facilitate learning of one or more additional ontological rules.

    摘要翻译: 提供了一种便于学习资源描述框架数据库的一个或多个本体论规则的技术。 这些技术包括从资源描述框架数据库中获取本体词汇,通过向先前学习的规则添加一个或多个谓词,通过逐步建立在先前学习的规则上,从数据库生成规则假设,通过以下方式对生成的规则假设执行约束检查: 确定与每个先前学习的规则的兼容性,以确保包括每个先前学习的规则和生成的规则假设的完整规则集合是一致的,使用一个或多个关联规则挖掘技术来将规则假设作为规则验证,以确定规则假设的有效性 数据库,以及将规则应用于数据库以从数据库推断一个或多个事实,以便于学习一个或多个附加本体规则。