Data virtualization across heterogeneous formats

    公开(公告)号:US10740304B2

    公开(公告)日:2020-08-11

    申请号:US14467640

    申请日:2014-08-25

    IPC分类号: G06F16/21 G06F16/22

    摘要: Various embodiments virtualize data across heterogeneous formats. In one embodiment, a plurality of heterogeneous data sources is received as input. A local schema graph including a set of attribute nodes and a set of type nodes is generated for each of the plurality of heterogeneous data sources. A global schema graph is generated based on each local schema graph that has been generated. The global schema graph comprises each of the local schema graphs and at least one edge between at least one of two or more attributes nodes and two or more type nodes from different local schema graphs. The edge indicates a relationship between the data sources represented by the different local schema graphs comprising the two or more attributes nodes based on a computed similarity between at least one value associated with each of the two or more attributes nodes.

    Optimizing sparse schema-less data in data stores

    公开(公告)号:US10360262B2

    公开(公告)日:2019-07-23

    申请号:US15631550

    申请日:2017-06-23

    IPC分类号: G06F16/21 G06F16/901

    摘要: Various embodiments of the invention relate to optimizing storage of schema-less data. At least one of a schema-less dataset including a plurality of resources one or more query workloads associated with the plurality of resources is received. Each resource is associated with at least a plurality of properties. At least one set of co-occurring properties from the plurality of properties is identified. A graph including a plurality of nodes is generated. Each of the nodes represents a unique property in the set of co-occurring properties. The graph further includes an edge connecting each node representing a pair of co-occurring properties. A schema is generated based on the graph that assigns a column identifier from a table to each unique property represented by one of the nodes in the graph.

    Method and apparatus for storing sparse graph data as multi-dimensional cluster
    28.
    发明授权
    Method and apparatus for storing sparse graph data as multi-dimensional cluster 有权
    用于将稀疏图数据存储为多维集群的方法和装置

    公开(公告)号:US09323825B2

    公开(公告)日:2016-04-26

    申请号:US13967261

    申请日:2013-08-14

    IPC分类号: G06F17/30

    摘要: A system for storing graph data as a multi-dimensional cluster having a database with a graph dataset containing data and relationships between data pairs and a schema list of storage methods that use a table with columns and rows associated with data or relationships. An analyzer module to collect statistics of a graph dataset and a dimension identification module to identify a plurality of dimensions that each represent a column in the table. A schema creation and loading module creates a modified storage method and having a plurality of distinct table blocks and a plurality of table block indexes, one index for each table block and arranges the data and relationships in the given graph dataset in accordance with the modified storage method to create the multi-dimensional cluster.

    摘要翻译: 用于将图形数据存储为具有数据库的图形数据的系统,该数据库具有包含数据和数据对之间的关​​系的图形数据集,以及使用具有与数据或关系相关联的列和行的表的存储方法的模式列表。 分析器模块,用于收集图形数据集和维度识别模块的统计信息,以识别每个表示表中的列的多个维度。 模式创建和加载模块创建经修改的存储方法并且具有多个不同的表块和多个表块索引,每个表块的一个索引,并且根据修改的存储器将数据和关系布置在给定图形数据集中 方法来创建多维集群。

    Method and Apparatus for Identifying the Optimal Schema to Store Graph Data in a Relational Store
    29.
    发明申请
    Method and Apparatus for Identifying the Optimal Schema to Store Graph Data in a Relational Store 有权
    用于识别最佳模式以在关系存储中存储图形数据的方法和装置

    公开(公告)号:US20150052175A1

    公开(公告)日:2015-02-19

    申请号:US13967031

    申请日:2013-08-14

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30958 G06F17/30292

    摘要: A system for identifying a schema for storing graph data includes a database containing a graph dataset of data and relationships between data pairs and a list of storage methods that each are a distinct structural arrangement of the data and relationships from the graph data set. An analyzer module collects statistics for the graph dataset, and a data classification module uses the collected statistics to calculate metrics describing the data and relationships in the graph dataset, uses the calculated metrics to group the data and relationships into a plurality of graph dataset subsets and. associates each graph dataset subset with one of the plurality of storage methods. The resulting group of storage methods associated with the plurality of graph dataset subsets includes a unique storage method for each graph dataset subset. The data and relationships in each graph dataset subset are arranged in accordance with associated storage methods.

    摘要翻译: 用于识别用于存储图形数据的模式的系统包括数据库,该数据库包含数据的图形数据集和数据对之间的关​​系以及存储方法的列表,每个存储方法是与图形数据集的数据和关系的不同结构布置。 分析器模块收集图形数据集的统计信息,数据分类模块使用收集的统计信息来计算描述图形数据集中的数据和关系的度量,使用计算的度量将数据和关系分组为多个图形数据集子集,以及 。 将每个图形数据集子集与多个存储方法之一相关联。 与多个图形数据集子集相关联的所得到的存储方法组包括用于每个图形数据集子集的唯一存储方法。 每个图形数据集子集中的数据和关系按照相关的存储方法进行排列。

    Scalable Summarization of Data Graphs
    30.
    发明申请
    Scalable Summarization of Data Graphs 有权
    数据图的可缩放总结

    公开(公告)号:US20140143280A1

    公开(公告)日:2014-05-22

    申请号:US13682245

    申请日:2012-11-20

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30292

    摘要: Keyword searching is used to explore and search large Resource Description Framework datasets having unknown or constantly changing structures. A succinct and effective summarization is built from the underlying resource description framework data. Given a keyword query, the summarization lends significant pruning powers to exploratory keyword searches and leads to much better efficiency compared to previous work. The summarization returns exact results and can be updated incrementally and efficiently.

    摘要翻译: 关键字搜索用于探索和搜索具有未知或不断变化的结构的大型资源描述框架数据集。 从底层资源描述框架数据构建一个简洁有效的总结。 给出一个关键词查询,总结可以显着的修剪关键词搜索的修剪权力,并且与以前的工作相比,效率要好得多。 总结返回精确的结果,并可以逐步和有效地更新。