Abstract:
Updates for clients accessing a data volume across fault tolerance zones may be replicated. Requests to write or read a block of a logical volume with replicas located in different fault tolerance zones may be sent to the different replicas. Responses for the different requests may be evaluated to determine whether quorum is satisfied for the write or read of the block of the logical volume. For writes that satisfy quorum, a request to commit the write may be sent to the replicas of the logical volume.
Abstract:
A resource management service implements techniques for provisioning a process with computing resources sufficient to process a query. A query is received and computing system resources sufficient to process the query are provisioned. A response to the query is generated by running the process with the provisioned computing system resources.
Abstract:
The present disclosure generally relates to creating virtualized block storage devices whose data is replicated across isolated computing systems to lower risk of data loss even in wide-scale events, such as natural disasters. The virtualized device can include at least two volumes, each of which is implemented in a distinct computing system. In the case of a failed volume, a new volume can be created and populated with data from the surviving volume. During population, new writes can continue to be replicated to the new volume. The population process can write data from the surviving volume to the new volume “under” new writes, such that the population process does not overwrite data included in the new writes.
Abstract:
Providers of web services and other types of software as a service may be subject to service-level agreements requiring that response times be within a defined range. For efficiency, multiple services may be hosted on the same set of computing nodes, which may jeopardize adherence to service-level agreements. A control system may involve classifying service requests and determining desired values for measurements such as latency. An error value may be calculated based on the difference between measured and desired values. A controller may adjust a rate of capacity utilization for the computing nodes based on the current error, a history of past errors, and a prediction of future errors.
Abstract:
Generally described, one or more aspects of the present application correspond to a highly distributed replica of a volume stored in a networked computing environment. First and second replicas of the volume can be synchronously replicated, and some implementations of the tertiary replica can be asynchronously replicated. The highly distributed nature of the tertiary replica supports parallel data transfer of the data of the volume, resulting in faster creation of backups and new copies of the volume.
Abstract:
Methods and apparatus for equitable distribution of excess shared-resource throughput capacity are disclosed. A first and a second work target are configured to access a shared resource to implement accepted work requests. Admission control is managed at the work targets using respective token buckets. A first metric indicative of the work request arrival rates at the work targets during a time interval, and a second metric associated with the provisioned capacities of the work targets are determined. A number of tokens determined based on a throughput limit of the shared resource is distributed among the work targets to be used for admission control during a subsequent time interval. The number of tokens distributed to each work target is based on the first metric and/or the second metric.
Abstract:
A database service may maintain tables on behalf of clients and may provision throughput capacity for those tables. A table may be divided into multiple partitions, according to hash of the primary key values for each of the items in the table, and the items in the table may be accessed using the hash of their primary key values. Provisioned throughput capacity for the table may be divided between the partitions and used in servicing requests directed to items in the table. The service (or underlying system) may provide mechanisms for generating skew-related metrics or reports and presenting them to clients via a graphical user interface (GUI). The metrics and reports may indicate the amount of uniformity or skew in the distribution of requests across the key space for the table using histograms, heat maps, or other representations. Clients may initiate actions to correct any skewing via the GUI.
Abstract:
Detecting replica faults within a replica group and dynamically scheduling replica healing operations are described. Status metadata for one or more replica groups may be accessed. Based, at least in part, the status data a number of available replicas for at least one replica group may be determined to incompliant with a healthy state definition for the replica group. One or more healing operations to restore the number of available replicas for the at least one replica group to the respective healthy state definition may be dynamically scheduled. In some embodiments, one or more resource constraints for performing healing operations and one or more resource requirements for each of the one or more healing operations may be used to order the one or more healing operations.
Abstract:
The present disclosure generally relates to creating virtualized block storage devices whose data is replicated across isolated computing systems to lower risk of data loss even in wide-scale events, such as natural disasters. The virtualized device can include at least two volumes, each of which is implemented in a distinct computing system. Each volume can be implemented by at least two computing devices, a first of which is configured as a primary device to which reads from and writes to the volume are directed. Of the two volumes, one can be indicated as primary, indicating authority to accept reads to and writes from the virtualized device. A primary device of the primary volume, on obtaining a write to the volume, can replicate the write to both a secondary device of a primary volume and to the secondary volume.
Abstract:
Generally described, aspects of the present application correspond to maintaining a message stream for a network-based data store, which stream includes messages reflecting modifications to the data store. Messages within the stream may be used to revert a state of the data store to a prior point in time reflected within the messages of the stream, such as by “rewinding” operations on the data store by use of the messages within the stream. Messages in the stream may further be used to asynchronously update a replica of the data store.