EFFICIENT QUERY PROCESSING USING HISTOGRAMS IN A COLUMNAR DATABASE
    21.
    发明申请
    EFFICIENT QUERY PROCESSING USING HISTOGRAMS IN A COLUMNAR DATABASE 有权
    使用色谱数据库中的组织进行有效的查询处理

    公开(公告)号:US20140201129A1

    公开(公告)日:2014-07-17

    申请号:US13742287

    申请日:2013-01-15

    Abstract: A probabilistic data structure is generated for efficient query processing using a histogram for unsorted data in a column of a columnar database. A bucket range size is determined for multiples buckets of a histogram of a column in a columnar database table. In at least some embodiments, the histogram may be a height-balanced histogram. A probabilistic data structure is generated to indicate for which particular buckets in the histogram there is a data value stored in the data block. When an indication of a query directed to the column for select data is received, the probabilistic data structure for each of the data blocks storing data for the column may be examined to determine particular ones of the data blocks which do not need to be read in order to service the query for the select data.

    Abstract translation: 生成概率数据结构,用于使用柱状数据库列中未排序数据的直方图进行有效的查询处理。 对列数据库表中的列的直方图的倍数桶确定桶范围大小。 在至少一些实施例中,直方图可以是高度平衡直方图。 生成概率数据结构以指示直方图中的哪个特定桶存在数据块中的数据值。 当接收到针对选择数据的列的查询的指示时,可以检查存储列的数据的每个数据块的概率数据结构,以确定不需要读取的特定数据块 命令为查询选择数据提供服务。

    STREAMING RESTORE OF A DATABASE FROM A BACKUP SYSTEM
    22.
    发明申请
    STREAMING RESTORE OF A DATABASE FROM A BACKUP SYSTEM 有权
    从备份系统循环删除数据库

    公开(公告)号:US20140149355A1

    公开(公告)日:2014-05-29

    申请号:US13792643

    申请日:2013-03-11

    Abstract: A distributed data warehouse system may maintain data blocks on behalf of clients in multiple clusters in a data store. Each cluster may include a single leader node and multiple compute nodes, each including multiple disks storing data. The warehouse system may store primary and secondary copies of each data block on different disks or nodes in a cluster. Each node may include a data structure that maintains metadata about each data block stored on the node, including its unique identifier. The warehouse system may back up data blocks in a remote key-value backup storage system with high durability. A streaming restore operation may be used to retrieve data blocks from backup storage using their unique identifiers as keys. The warehouse system may service incoming queries (and may satisfy some queries by retrieving data from backup storage on an as-needed basis) prior to completion of the restore operation.

    Abstract translation: 分布式数据仓库系统可以代表数据存储中的多个集群中的客户端来维护数据块。 每个集群可以包括单个前导节点和多个计算节点,每个节点包括存储数据的多个磁盘。 仓库系统可以将每个数据块的主副本存储在集群中的不同磁盘或节点上。 每个节点可以包括维护关于存储在节点上的每个数据块的元数据的数据结构,包括其唯一的标识符。 仓库系统可以备份具有高耐久性的远程键值备份存储系统中的数据块。 流式还原操作可用于使用其唯一标识符作为密钥从备份存储中检索数据块。 在完成还原操作之前,仓库系统可以服务传入的查询(并且可以根据需要从备份存储中检索数据来满足一些查询)。

    MAINTAINING DATA LINEAGE TO DETECT DATA EVENTS

    公开(公告)号:US20180173774A1

    公开(公告)日:2018-06-21

    申请号:US15385789

    申请日:2016-12-20

    Abstract: History for data objects may be maintained to detect data events. An indication of an Extract, Transform, Load (ETL) process applied to one or more source data objects to generate one or more transformed data objects may be received. History for the source data objects may be updated to include the transformed data objects and the ETL process that generated the transformed data objects. An evaluation of the update may be performed to determine whether an event associated with the data lineage is triggered. If the event is triggered, a notification of the event may be sent to one or more subscribers for the event.

    EFFICIENT QUERY PROCESSING USING HISTOGRAMS IN A COLUMNAR DATABASE

    公开(公告)号:US20160171064A1

    公开(公告)日:2016-06-16

    申请号:US15050104

    申请日:2016-02-22

    Abstract: A probabilistic data structure is generated for efficient query processing using a histogram for unsorted data in a column of a columnar database. A bucket range size is determined for multiples buckets of a histogram of a column in a columnar database table. In at least some embodiments, the histogram may be a height-balanced histogram. A probabilistic data structure is generated to indicate for which particular buckets in the histogram there is a data value stored in the data block. When an indication of a query directed to the column for select data is received, the probabilistic data structure for each of the data blocks storing data for the column may be examined to determine particular ones of the data blocks which do not need to be read in order to service the query for the select data.

    SELF-DESCRIBING DATA BLOCKS OF A MINIMUM ATOMIC WRITE SIZE FOR A DATA STORE
    27.
    发明申请
    SELF-DESCRIBING DATA BLOCKS OF A MINIMUM ATOMIC WRITE SIZE FOR A DATA STORE 审中-公开
    自动描述用于数据存储的最小原子写入大小的数据块

    公开(公告)号:US20150261610A1

    公开(公告)日:2015-09-17

    申请号:US14727644

    申请日:2015-06-01

    Abstract: Self-describing data blocks of a minimum atomic write size may be stored for a data store. Data may be received for storage in a data block of a plurality of data blocks at a persistent storage device that are equivalent to a minimum atomic write size for the persistent storage device. Metadata may be generated for the data that includes an error detection code which is generated for the data and the metadata together. The data and the metadata are sent to the persistent storage device to store together in the data block. An individual atomic write operation may write together the data and the metadata in the data block. When accessed, the error detection code is applicable to detect errors. The metadata may also be applicable to determine whether the data is stored for a currently assigned purpose or a previously assigned purpose of the data block.

    Abstract translation: 可以为数据存储器存储最小原子写入大小的自描述数据块。 可以接收数据以存储在永久存储设备处的多个数据块的数据块中,其等同于持久存储设备的最小原子写入大小。 可以为包括为数据和元数据生成的错误检测码的数据生成元数据。 将数据和元数据发送到持久存储设备,以一起存储在数据块中。 单独的原子写入操作可以将数据和元数据一起写入数据块中。 访问时,错误检测码适用于检测错误。 元数据还可以适用于确定数据是否存储在当前分配的目的或数据块的先前分配的目的上。

    EFFICIENT QUERY PROCESSING IN COLUMNAR DATABASES USING BLOOM FILTERS
    28.
    发明申请
    EFFICIENT QUERY PROCESSING IN COLUMNAR DATABASES USING BLOOM FILTERS 审中-公开
    使用BLOOM过滤器在COLUMNAR数据库中进行有效的查询处理

    公开(公告)号:US20150169655A1

    公开(公告)日:2015-06-18

    申请号:US14635844

    申请日:2015-03-02

    CPC classification number: G06F17/30315 G06F17/30563 G06F17/30592

    Abstract: A bloom filter is generated for efficient query processing for unsorted data in a column of a columnar database. Bloom filters represented as bitmaps are generated for data blocks storing data for a column of a columnar database table. An indication of a query directed toward the column is received and the bloom filter for each data block is examined to determine which ones of the data blocks do not need to be read in order to service the query for the select data. Data is then read from the data blocks storing data for the column excepting the ones which do not need to be read.

    Abstract translation: 生成一个布隆过滤器,用于对列数据库列中的未排序数据进行高效查询处理。 针对存储列列数据库表的列的数据的数据块生成表示为位图的布隆式过滤器。 接收到针对列的询问的指示,并检查每个数据块的布隆过滤器,以确定哪些数据块不需要被读取,以便为查询选择数据提供服务。 然后从存储除了不需要读取的列之外的列的数据的数据块中读取数据。

    EFFICIENT DATA COMPRESSION AND ANALYSIS AS A SERVICE
    29.
    发明申请
    EFFICIENT DATA COMPRESSION AND ANALYSIS AS A SERVICE 有权
    有效的数据压缩和分析作为服务

    公开(公告)号:US20140351229A1

    公开(公告)日:2014-11-27

    申请号:US13900350

    申请日:2013-05-22

    Abstract: Data may be efficiently analyzed and compressed as part of a data compression service. A data compression request may be received from a client indicating data to be compressed. An analysis of the data or metadata associated with the data may be performed. In at least some embodiments, this analysis may be a rules-based analysis. Some embodiments may employ one or more machine learning techniques to historical compression data to update the rules-based analysis. One or more compression techniques may be selected out of a plurality of compression techniques to be applied to the data. Data compression candidates may then be generated according to the selected compression techniques. In some embodiments, a compression service restriction may be enforced. One of the data compression candidates may be selected and sent in a response.

    Abstract translation: 数据可以作为数据压缩服务的一部分被有效地分析和压缩。 可以从客户端接收指示要压缩的数据的数据压缩请求。 可以执行与数据相关联的数据或元数据的分析。 在至少一些实施例中,该分析可以是基于规则的分析。 一些实施例可以对历史压缩数据采用一种或多种机器学习技术来更新基于规则的分析。 可以从应用于数据的多种压缩技术中选择一种或多种压缩技术。 然后可以根据选择的压缩技术生成数据压缩候选。 在一些实施例中,可以强制执行压缩服务限制。 可以在响应中选择并发送其中一个数据压缩候选。

    FAST CRASH RECOVERY FOR DISTRIBUTED DATABASE SYSTEMS
    30.
    发明申请
    FAST CRASH RECOVERY FOR DISTRIBUTED DATABASE SYSTEMS 审中-公开
    用于分布式数据库系统的快速恢复

    公开(公告)号:US20140279930A1

    公开(公告)日:2014-09-18

    申请号:US14201505

    申请日:2014-03-07

    Abstract: A distributed database system may implement fast crash recovery. Upon recovery from a database head node failure, a connection with one or more storage nodes of a distributed storage system storing data for a database implemented by the database head node may be established. Upon establishment of the connection with the storage nodes, that database may be made available for access, such as for various access requests. In various embodiments, redo log records may not be replayed in order to provide access to the database. In at least some embodiments, the storage nodes may provide a current state of data stored for the database in response to requests.

    Abstract translation: 分布式数据库系统可以实现快速崩溃恢复。 在从数据库头节点故障恢复时,可以建立与存储由数据库头节点实现的数据库的分布式存储系统的一个或多个存储节点的连接。 在建立与存储节点的连接时,可以使数据库可用于访问,例如用于各种访问请求。 在各种实施例中,重做日志记录可能不被重放以提供对数据库的访问。 在至少一些实施例中,存储节点可以响应于请求来提供为数据库存储的数据的当前状态。

Patent Agency Ranking