-
公开(公告)号:US20140149355A1
公开(公告)日:2014-05-29
申请号:US13792643
申请日:2013-03-11
Applicant: Amazon Technologies, Inc.
Inventor: ANURAG WINDLASS GUPTA , JAKUB KULESZA , DEEPAK AGARWAL , ALEKSANDRAS SURNA , TUSHAR JAIN , ZELAINE FONG , STEFANO STEFANI
IPC: G06F17/30
CPC classification number: G06F17/30575 , G06F11/1446 , G06F11/1471 , G06F17/30008 , G06F17/30371 , G06F17/30424 , G06F2201/82
Abstract: A distributed data warehouse system may maintain data blocks on behalf of clients in multiple clusters in a data store. Each cluster may include a single leader node and multiple compute nodes, each including multiple disks storing data. The warehouse system may store primary and secondary copies of each data block on different disks or nodes in a cluster. Each node may include a data structure that maintains metadata about each data block stored on the node, including its unique identifier. The warehouse system may back up data blocks in a remote key-value backup storage system with high durability. A streaming restore operation may be used to retrieve data blocks from backup storage using their unique identifiers as keys. The warehouse system may service incoming queries (and may satisfy some queries by retrieving data from backup storage on an as-needed basis) prior to completion of the restore operation.
Abstract translation: 分布式数据仓库系统可以代表数据存储中的多个集群中的客户端来维护数据块。 每个集群可以包括单个前导节点和多个计算节点,每个节点包括存储数据的多个磁盘。 仓库系统可以将每个数据块的主副本存储在集群中的不同磁盘或节点上。 每个节点可以包括维护关于存储在节点上的每个数据块的元数据的数据结构,包括其唯一的标识符。 仓库系统可以备份具有高耐久性的远程键值备份存储系统中的数据块。 流式还原操作可用于使用其唯一标识符作为密钥从备份存储中检索数据块。 在完成还原操作之前,仓库系统可以服务传入的查询(并且可以根据需要从备份存储中检索数据来满足一些查询)。
-
公开(公告)号:US20140149356A1
公开(公告)日:2014-05-29
申请号:US13792671
申请日:2013-03-11
Applicant: AMAZON TECHNOLOGIES, INC.
Inventor: DEEPAK AGARWAL , ANURAG WINDLASS GUPTA , JAKUB KULESZA
IPC: G06F17/30
CPC classification number: G06F17/30575 , G06F11/1446 , G06F11/1471 , G06F17/30008 , G06F17/30371 , G06F17/30424 , G06F2201/82
Abstract: A distributed data warehouse system maintains data blocks on behalf of clients, and stores primary and secondary copies of data blocks on different disks or nodes in a cluster. The data warehouse system may back up data blocks in a key-value backup storage system. In response to a query targeting a data block previously stored in the cluster, the data warehouse system may determine whether a consistent, uncorrupted copy of the data block is available in the cluster (e.g., by applying a consistency check). If not (e.g., if a disk or node failed), the data warehouse system may automatically initiate an operation to restore the data block from the backup storage system, using a unique identifier of the data block to access a backup copy. The target data may be returned in a query response prior to restoring primary and secondary copies of the data block in the cluster.
Abstract translation: 分布式数据仓库系统代表客户端维护数据块,并将数据块的主副本存储在集群中的不同磁盘或节点上。 数据仓库系统可以备份密钥值备份存储系统中的数据块。 响应于针对先前存储在集群中的数据块的查询,数据仓库系统可以确定数据块的一致的未破坏的副本在集群中是否可用(例如,通过应用一致性检查)。 如果没有(例如,如果磁盘或节点出现故障),则数据仓库系统可以使用数据块的唯一标识符来自备份存储系统自动启动恢复数据块的操作以访问备份副本。 在恢复群集中的数据块的主副本之前,可以在查询响应中返回目标数据。
-