Abstract:
A distributed data warehouse system may maintain data blocks on behalf of clients in multiple clusters in a data store. Each cluster may include a single leader node and multiple compute nodes, each including multiple disks storing data. The warehouse system may store primary and secondary copies of each data block on different disks or nodes in a cluster. Each node may include a data structure that maintains metadata about each data block stored on the node, including its unique identifier. The warehouse system may back up data blocks in a remote key-value backup storage system with high durability. A streaming restore operation may be used to retrieve data blocks from backup storage using their unique identifiers as keys. The warehouse system may service incoming queries (and may satisfy some queries by retrieving data from backup storage on an as-needed basis) prior to completion of the restore operation.
Abstract:
A system that implements a scalable data storage service may maintain tables in a non-relational data store on behalf of clients. The system may provide a Web services interface through which service requests are received, and an API usable to request that a table be created, deleted, or described; that an item be stored, retrieved, deleted, or its attributes modified; or that a table be queried (or scanned) with filtered items and/or their attributes returned. An asynchronous workflow may be invoked to create or delete a table. Items stored in tables may be partitioned and indexed using a simple or composite primary key. The system may not impose pre-defined limits on table size, and may employ a flexible schema. The service may provide a best-effort or committed throughput model. The system may automatically scale and/or re-partition tables in response to detecting workload changes, node failures, or other conditions or anomalies.
Abstract:
A system that implements a data storage service may store data on behalf of storage service clients. The system may maintain data in multiple replicas of partitions that are stored on respective computing nodes in the system. The system may split a data partition into two new partitions, and may split the replica group that stored the original partitions into two new replica groups, each storing one of the new partitions. To split the replica group, the master replica may propagate membership changes to the other members of the replica group for adding members to the original replica group and for splitting the expanded replica group into two new replica groups. Subsequent to the split, replicas may attempt to become the master for the original replica group or for a new replica group. If an attempt to become master replica for the original replica group succeeds, the split may fail.
Abstract:
A distributed data warehouse system maintains data blocks on behalf of clients, and stores primary and secondary copies of data blocks on different disks or nodes in a cluster. The data warehouse system may back up data blocks in a key-value backup storage system. In response to a query targeting a data block previously stored in the cluster, the data warehouse system may determine whether a consistent, uncorrupted copy of the data block is available in the cluster (e.g., by applying a consistency check). If not (e.g., if a disk or node failed), the data warehouse system may automatically initiate an operation to restore the data block from the backup storage system, using a unique identifier of the data block to access a backup copy. The target data may be returned in a query response prior to restoring primary and secondary copies of the data block in the cluster.
Abstract:
A system that implements a scalable data storage service may maintain tables in a non-relational data store on behalf of clients. The system may provide a Web services interface through which service requests are received, and an API usable to request that a table be created, deleted, or described; that an item be stored, retrieved, deleted, or its attributes modified; or that a table be queried (or scanned) with filtered items and/or their attributes returned. An asynchronous workflow may be invoked to create or delete a table. Items stored in tables may be partitioned and indexed using a simple or composite primary key. The system may not impose pre-defined limits on table size, and may employ a flexible schema. The service may provide a best-effort or committed throughput model. The system may automatically scale and/or re-partition tables in response to detecting workload changes, node failures, or other conditions or anomalies.
Abstract:
Disclosed are various embodiments for distributing data items. A plurality of nodes forms a distributed data store. A new master candidate is determined through an election among the plurality of nodes. Before performing a failover from a failed master to the new master candidate, a consensus is reached among a locality-based failover quorum of the nodes. The quorum excludes any of the nodes that are in a failover quorum ineligibility mode.
Abstract:
A system that implements a data storage service may maintain tables in a data store on behalf of clients. The service may maintain table data in multiple replicas of partitions of the data that are stored on respective computing nodes in the system. In response to detecting a failure or fault condition, or receiving a service request from a client to move or copy a partition replica, the data store may copy a partition replica to another computing node using a physical copy mechanism. The physical copy mechanism may copy table data from physical storage locations in which it is stored to physical storage locations allocated to a destination replica on the other computing node. During copying, service requests to modify table data may be logged and applied to the replica being copied. A catch-up operation may be performed to apply modification requests received during copying to the destination replica.
Abstract:
A system that implements a scalable data storage service may maintain tables in a non-relational data store on behalf of clients. The system may provide a Web services interface through which service requests are received, and an API usable to request that a table be created, deleted, or described; that an item be stored, retrieved, deleted, or its attributes modified; or that a table be queried (or scanned) with filtered items and/or their attributes returned. An asynchronous workflow may be invoked to create or delete a table. Items stored in tables may be partitioned and indexed using a simple or composite primary key. The system may not impose pre-defined limits on table size, and may employ a flexible schema. The service may provide a best-effort or committed throughput model. The system may automatically scale and/or re-partition tables in response to detecting workload changes, node failures, or other conditions or anomalies.
Abstract:
A system that implements a data storage service may store data on behalf of storage service clients. The system may maintain data in multiple replicas of various partitions that are stored on respective computing nodes in the system. The system may employ a single master failover protocol, usable when a replica attempts to become the master replica for a replica group of which it is a member. Attempting to become the master replica may include acquiring a lock associated with the replica group, and gathering state information from the other replicas in the group. The state information may indicate whether another replica supports the attempt (in which case it is included in a failover quorum) or stores more recent data or metadata than the replica attempting to become the master (in which case synchronization may be required). If the failover quorum includes enough replicas, the replica may become the master.
Abstract:
A system that implements a scalable data storage service may maintain tables in a data store on behalf of storage service clients. The service may maintain table data in multiple replicas of partitions that are stored on respective computing nodes in the system. In response to detecting an anomaly in the system, detecting a change in data volume on a partition or service request traffic directed to a partition, or receiving a service request from a client to split a partition, the data storage service may create additional copies of a partition replica using a physical copy mechanism. The data storage service may issue a split command defined in an API for the data store to divide the original and additional replicas into multiple replica groups, and to configure each replica group to maintain a respective portion of the table data that was stored in the partition before the split.