Efficient column based data encoding for large-scale data storage
    1.
    发明授权
    Efficient column based data encoding for large-scale data storage 有权
    高效的基于列的数据编码用于大规模数据存储

    公开(公告)号:US08452737B2

    公开(公告)日:2013-05-28

    申请号:US13347367

    申请日:2012-01-10

    IPC分类号: G06F17/30

    摘要: The subject disclosure relates to column based data encoding where raw data to be compressed is organized by columns, and then, as first and second layers of reduction of the data size, dictionary encoding and/or value encoding are applied to the data as organized by columns, to create integer sequences that correspond to the columns. Next, a hybrid greedy run length encoding and bit packing compression algorithm further compacts the data according to an analysis of bit savings. Synergy of the hybrid data reduction techniques in concert with the column-based organization, coupled with gains in scanning and querying efficiency owing to the representation of the compact data, results in substantially improved data compression at a fraction of the cost of conventional systems.

    摘要翻译: 本公开涉及基于列的数据编码,其中待压缩的原始数据由列组织,然后作为数据大小的第一和第二层缩减,字典编码和/或值编码被应用于由 列,以创建与列相对应的整数序列。 接下来,混合贪婪跑步长度编码和位打包压缩算法根据比特节省的分析进一步压缩数据。 混合数据简化技术与基于列的组织协调一致,加上由于表示紧凑数据而在扫描和查询效率方面的增益,导致数据压缩大大提高了传统系统成本的一小部分。

    Multidimensional data cubes with high-cardinality attributes
    2.
    发明授权
    Multidimensional data cubes with high-cardinality attributes 有权
    具有高基数属性的多维数据立方体

    公开(公告)号:US08380748B2

    公开(公告)日:2013-02-19

    申请号:US12042674

    申请日:2008-03-05

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F17/30592

    摘要: Computer-readable media, systems, and methods for building a multidimensional data cube having one or more high-cardinality attributes are described. In embodiments, data is extracted from one or more databases. It is determined that one or more instances of the data are fact data and one or more instances of the data are dimension data. Each member of the fact data is one instance of a dimension and each instance of the dimension data includes an attribute for grouping the fact data. Moreover, in embodiments it is determined that one or more instances of the dimension data are high-cardinality attributes. The one or more high-cardinality attributes are processed with fact data and stored in fact tables on a computer storage medium.

    摘要翻译: 描述了用于构建具有一个或多个高基数属性的多维数据立方体的计算机可读介质,系统和方法。 在实施例中,从一个或多个数据库提取数据。 确定数据的一个或多个实例是事实数据,并且数据的一个或多个实例是尺寸数据。 事实数据的每个成员是维度的一个实例,维数据的每个实例包括用于对事实数据进行分组的属性。 此外,在实施例中,确定尺寸数据的一个或多个实例是高基数属性。 一个或多个高基数属性用事实数据处理并存储在计算机存储介质上的事实表中。

    PROCESSING RECORDS IN DYNAMIC RANGES
    3.
    发明申请
    PROCESSING RECORDS IN DYNAMIC RANGES 有权
    在动态范围内处理记录

    公开(公告)号:US20120271845A1

    公开(公告)日:2012-10-25

    申请号:US13092978

    申请日:2011-04-25

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30454 G06F17/30412

    摘要: A scalable analysis system is described herein that performs common data analysis operations such as distinct counts and data grouping in a more scalable and efficient manner. The system allows distinct counts and data grouping to be applied to large datasets with predictable growth in the cost of the operation. The system dynamically partitions data based on the actual data distribution, which provides both scalability and uncompromised performance. The system sets a budget of available memory or other resources to use for the operation. As the operation progresses, the system determines whether the budget of memory is nearing exhaustion. Upon detecting that the memory used is near the limit, the system dynamically partitions the data. If the system still detects memory pressure, then the system partitions again, until a partition level is identified that fits within the memory budget.

    摘要翻译: 本文描述了可扩展分析系统,其以更可扩展和有效的方式执行诸如不同计数和数据分组之类的共同数据分析操作。 该系统允许将不同的计数和数据分组应用于具有可预测的操作成本增长的大型数据集。 系统根据实际的数据分布动态分割数据,提供了可扩展性和无与伦比的性能。 系统设置可用内存或其他资源的预算用于操作。 随着操作的进行,系统确定存储器的预算是否接近耗尽。 在检测到所使用的内存接近限制时,系统会动态分区数据。 如果系统仍然检测到内存压力,则系统再次分区,直到识别出符合内存预算的分区级别。

    Random access in run-length encoded structures
    4.
    发明授权
    Random access in run-length encoded structures 有权
    游程编码结构中的随机访问

    公开(公告)号:US07952499B1

    公开(公告)日:2011-05-31

    申请号:US12696226

    申请日:2010-01-29

    IPC分类号: H03M7/46

    CPC分类号: H03M7/46

    摘要: Random access to run-length encoded data values is provided. A target value is identified by a logical index into a structure of run-length-encoded values. To access the value, a bookmark is selected based on the logical index, on a maximum logical index of the bookmark, and on a specified bookmark distance. An initial run in the structure is located, based on the selected bookmark. A final run is chosen, at most one bookmark distance from the initial run. The target value is the value of the final run. Efficiency heuristics are used when generating bookmarks or creating the structure of run-length-encoded values.

    摘要翻译: 提供对游程长度编码数据值的随机访问。 目标值由逻辑索引识别为运行长度编码值的结构。 要访问该值,将基于逻辑索引,书签的最大逻辑索引以及指定的书签距离来选择书签。 根据所选书签,定位在结构中的初始运行。 选择最后一个运行,距离初始运行最多一个书签距离。 目标值是最终运行的值。 当生成书签或创建运行长度编码值的结构时,使用效率启发式方法。

    EXPLAINING CHANGES IN MEASURES THRU DATA MINING
    6.
    发明申请
    EXPLAINING CHANGES IN MEASURES THRU DATA MINING 有权
    解释数据挖掘中的措施变化

    公开(公告)号:US20090012919A1

    公开(公告)日:2009-01-08

    申请号:US11772480

    申请日:2007-07-02

    IPC分类号: G06F15/18

    CPC分类号: G06F17/30592

    摘要: Systems and methodologies for identification of factors that cause significant shifts in transactions in a relational store and/or OLAP environment. Transactions are grouped into significant categories defined across the whole data space, to detect interesting sub spaces transactions. Subsequently, sub spaces that show strong variance between two slices can be selected, followed by grouping the subspaces in sub reports to measure the coverage for each sub report. A final report can then be generated that contains list of sub-reports detected in the previous acts.

    摘要翻译: 用于识别在关系存储和/或OLAP环境中导致事务重大变化的因素的系统和方法。 事务被分组在整个数据空间中定义的重要类别中,以检测有趣的子空间事务。 随后,可以选择显示两个切片之间强差异的子空间,然后在子报告中对子空间进行分组,以测量每个子报告的覆盖范围。 然后可以生成包含先前行为中检测到的子报告列表的最终报告。

    Extensions for adding and removing calculated members in a multidimensional database
    7.
    发明授权
    Extensions for adding and removing calculated members in a multidimensional database 有权
    用于在多维数据库中添加和删除计算所得成员的扩展

    公开(公告)号:US07328206B2

    公开(公告)日:2008-02-05

    申请号:US10624726

    申请日:2003-07-23

    IPC分类号: G06F17/30

    摘要: Systems, clients, servers, methods, and computer-readable media of varying scope are described in which, two extensions for a multidimensional database query language extensions, AddCalculatedMembers and StripCalculatedMembers, allow an OLAP client to easily control the integration of calculated members into the results of OLAP database queries. The OLAP client need not be aware of the details of which calculated members are defined within the multidimensional database and need not explicitly request the inclusion or removal of each calculate member from the output data set of the query.

    摘要翻译: 描述了不同范围的系统,客户端,服务器,方法和计算机可读介质,其中用于多维数据库查询语言扩展,AddCalculatedMembers和StripCalculatedMembers的两个扩展允许OLAP客户端轻松地将计算的成员集成到结果中 的OLAP数据库查询。 OLAP客户端不需要知道在多维数据库中定义哪些计算成员的详细信息,并且不需要明确地请求从查询的输出数据集中包含或删除每个计算成员。

    Systems and methods for proactive caching utilizing OLAP variants
    8.
    发明授权
    Systems and methods for proactive caching utilizing OLAP variants 有权
    使用OLAP变体进行主动缓存的系统和方法

    公开(公告)号:US07269581B2

    公开(公告)日:2007-09-11

    申请号:US10402000

    申请日:2003-03-28

    IPC分类号: G06F7/00 G06F17/00 G06F17/30

    摘要: The present invention leverages MOLAP performance for ROLAP objects (dimensions, partitions and aggregations) by building, in a background process, a MOLAP equivalent of that object. When the background processing completes, queries are switched from ROLAP queries to MOLAP queries. When changes occur to relevant relational objects (such as tables that define content of OLAP objects), an OLAP object is switched back to a ROLAP mode, and all relevant caches are dropped while, as a background process, a new MOLAP equivalent is created.

    摘要翻译: 本发明通过在后台进程中构建该对象的MOLAP等价物来利用ROLAP对象(维度,分区和聚合)的MOLAP性能。 后台处理完成后,查询将从ROLAP查询切换到MOLAP查询。 当相关关系对象(例如定义OLAP对象的内容的表)发生更改时,OLAP对象将切换回ROLAP模式,并且删除所有相关缓存,而作为后台进程创建新的MOLAP等效项。

    Method, system, and apparatus for exposing workbook ranges as data sources
    9.
    发明申请
    Method, system, and apparatus for exposing workbook ranges as data sources 有权
    用于将工作簿范围暴露为数据源的方法,系统和装置

    公开(公告)号:US20050267853A1

    公开(公告)日:2005-12-01

    申请号:US10858175

    申请日:2004-06-01

    CPC分类号: G06F17/246 G06F17/30592

    摘要: A method, system, and apparatus are provided for exposing and utilizing workbook ranges as server data sources. The system includes a client computer capable of executing a spreadsheet application program for creating a workbook including a range that includes data objects. The workbook may be published to a server computer where the specified data objects are exposed as server data sources. The server computer allows client applications to discover and connect to the data objects contained within the workbook as server data sources.

    摘要翻译: 提供了一种用于将工作簿范围作为服务器数据源进行曝光和利用的方法,系统和装置。 该系统包括能够执行用于创建包括包括数据对象的范围的工作簿的电子表格应用程序的客户端计算机。 工作簿可能会被发布到服务器计算机,其中指定的数据对象作为服务器数据源公开。 服务器计算机允许客户端应用程序发现并连接到工作簿中包含的数据对象作为服务器数据源。

    Efficient large-scale processing of column based data encoded structures
    10.
    发明授权
    Efficient large-scale processing of column based data encoded structures 有权
    基于列的数据编码结构的高效大规模处理

    公开(公告)号:US08626725B2

    公开(公告)日:2014-01-07

    申请号:US12270872

    申请日:2008-11-14

    IPC分类号: G06F7/00 G06F17/00 G06F17/30

    CPC分类号: G06F17/30492

    摘要: The subject disclosure relates to efficient query processing over large scale data storage. An exemplary process includes retrieving a subset of columns implicated by a query as integer encoded and compressed sequences of values corresponding to different columns of data, defining query processing buckets that span over the subset of columns based on changes of compression type occurring in the integer encoded and compressed sequences of values of the subset of data and processing the query in memory on a bucket by bucket basis and processing the query based on type of current bucket when processing the integer encoded and compressed sequences of values. The column based organization of the data, and the application of a hybrid run length encoding and bit packing technique, enable a highly efficient and speedy query response in real-time.

    摘要翻译: 本公开涉及对大规模数据存储的有效查询处理。 示例性过程包括:将查询所涉及的列的子集作为对应于不同数据列的整数编码和压缩的值序列,基于经整数编码的压缩类型的变化来定义跨越​​列的子集的查询处理桶 以及数据子集的值的压缩序列,并且逐桶地处理存储器中的查询,并且当处理整数编码和压缩的值序列时,基于当前存储桶的类型来处理查询。 数据的基于列的组织以及混合运行长度编码和位打包技术的应用实现了高效和快速的查询响应。