Abstract:
A system that provides services to clients may receive and service requests, various ones of which may require different amounts of work. The system may determine whether it is operating in an overloaded or underloaded state based on a current work throughput rate, a target work throughput rate, a maximum request rate, or an actual request rate, and may dynamically adjust the maximum request rate in response. For example, if the maximum request rate is being exceeded, the maximum request rate may be raised or lowered, dependent on the current work throughput rate. If the target or committed work throughput rate is being exceeded, but the maximum request rate is not being exceeded, a lower maximum request rate may be proposed. Adjustments to the maximum request rate may be made using multiple incremental adjustments. Service request tokens may be added to a leaky token bucket at the maximum request rate.
Abstract:
A system that implements a scalable data storage service may maintain tables in a non-relational data store on behalf of clients. The system may provide a Web services interface through which service requests are received, and an API usable to request that a table be created, deleted, or described; that an item be stored, retrieved, deleted, or its attributes modified; or that a table be queried (or scanned) with filtered items and/or their attributes returned. An asynchronous workflow may be invoked to create or delete a table. Items stored in tables may be partitioned and indexed using a simple or composite primary key. The system may not impose pre-defined limits on table size, and may employ a flexible schema. The service may provide a best-effort or committed throughput model. The system may automatically scale and/or re-partition tables in response to detecting workload changes, node failures, or other conditions or anomalies.
Abstract:
A first transaction manager of a partitioned storage group stores a first conditional commit record for a first write of a multi-partition transaction based on a first conflict detection operation. A second transaction manager stores a second conditional commit record for a second write of the transaction based on a second conflict detection operation. A client-side component of the storage group determines that both writes have been conditionally committed, and stores an unconditional commit record in a commit decision repository. A write applier examines the first conditional commit record and the unconditional commit record before propagating the first write to the first partition.
Abstract:
Methods and apparatus for resource silos at network-accessible services are disclosed. A subset of resources used for a database service, including at least one resource from each of a plurality of data centers, is selected for membership in a resource silo based on grouping criteria. A silo routing layer node identifies the resource silo as the target silo to which a client work request is to be directed. The client work request is sent to a front-end resource of the target silo either by the client, or by the silo routing layer node on behalf of the client. The front-end resource of the target silo transmits a representation of the work request to a back-end resource of the target silo, where a work operation corresponding to request is performed.
Abstract:
Methods and apparatus for token-based admission control for replicated writes are disclosed. Data objects are divided into partitions, and corresponding to each partition, at least a master replica and a slave replica are stored. A determination as to whether to accept a write request directed to the partition is made based at least in part on one or more of (a) available throughput capacity at the master replica, and (b) an indication, obtained using a token-based protocol, of available throughput capacity at the slave replica. If the write request is accepted, one or more data modification operations are initiated.
Abstract:
A system that implements a data storage service may maintain tables in a data store on behalf of clients. The service may maintain table data in multiple replicas of partitions of the data that are stored on respective computing nodes in the system. In response to detecting a failure or fault condition, or receiving a service request from a client to move or copy a partition replica, the data store may copy a partition replica to another computing node using a physical copy mechanism. The physical copy mechanism may copy table data from physical storage locations in which it is stored to physical storage locations allocated to a destination replica on the other computing node. During copying, service requests to modify table data may be logged and applied to the replica being copied. A catch-up operation may be performed to apply modification requests received during copying to the destination replica.
Abstract:
A system that implements a data storage service may store data on behalf of storage service clients. The system may maintain data in multiple replicas that are stored on respective computing nodes in the system. Updates to the stored data and to the membership of replica groups are propagated as replicated log records. A replica receiving a log record may compare metadata in the received log record to corresponding metadata in a log record that was previously appended to its log to determine a response. The metadata may include a sequence number, a lock generation identifier, an epoch identifier, or an indication of an epoch change. The replica may append the received log record to its log, drop the received log record, or cache the received log record for future use. If a log conflict indicates an invalid log stream branch, one or more log records may be deleted.
Abstract:
At a client-side component of a storage group, a read descriptor generated in response to a read request directed to a first data store is received. The read descriptor includes a state transition indicator corresponding to a write that has been applied at the first data store. A write descriptor indicative of a write that depends on a result of the read request is generated at the client-side component. The read descriptor and the write descriptor are included in a commit request for a candidate transaction at the client-side component, and transmitted to a transaction manager.
Abstract:
A system that implements a scalable data storage service may maintain tables in a non-relational data store on behalf of clients. The system may provide a Web services interface through which service requests are received, and an API usable to request that a table be created, deleted, or described; that an item be stored, retrieved, deleted, or its attributes modified; or that a table be queried (or scanned) with filtered items and/or their attributes returned. An asynchronous workflow may be invoked to create or delete a table. Items stored in tables may be partitioned and indexed using a simple or composite primary key. The system may not impose pre-defined limits on table size, and may employ a flexible schema. The service may provide a best-effort or committed throughput model. The system may automatically scale and/or re-partition tables in response to detecting workload changes, node failures, or other conditions or anomalies.
Abstract:
A system that implements a data storage service may store data on behalf of storage service clients. The system may maintain data in multiple replicas of various partitions that are stored on respective computing nodes in the system. The system may employ a single master failover protocol, usable when a replica attempts to become the master replica for a replica group of which it is a member. Attempting to become the master replica may include acquiring a lock associated with the replica group, and gathering state information from the other replicas in the group. The state information may indicate whether another replica supports the attempt (in which case it is included in a failover quorum) or stores more recent data or metadata than the replica attempting to become the master (in which case synchronization may be required). If the failover quorum includes enough replicas, the replica may become the master.