Abstract:
Techniques are described for managing distributed execution of programs. In some situations, the techniques include determining configuration information to be used for executing a particular program in a distributed manner on multiple computing nodes and/or include providing information and associated controls to a user regarding ongoing distributed execution of one or more programs to enable the user to modify the ongoing distributed execution in various manners. Determined configuration information may include, for example, configuration parameters such as a quantity of computing nodes and/or other measures of computing resources to be used for the executing, and may be determined in various manners, including by interactively gathering values for at least some types of configuration information from an associated user (e.g., via a GUI that is displayed to the user) and/or by automatically determining values for at least some types of configuration information (e.g., for use as recommendations to a user).
Abstract:
A system that implements a scaleable data storage service may maintain tables in a non-relational data store on behalf of clients. Each table may include multiple items. Each item may include one or more attributes, each containing a name-value pair. Attribute values may be scalars or sets of numbers or strings. The system may provide an API usable to request that values of one or more of an item's attributes be updated. An update request may be conditional on expected values of one or more item attributes (e.g., the same or different item attributes). In response to a request to update the values of one or more item attributes, the previous values and/or updated values may be optionally returned for the updated item attributes or for all attributes of an item targeted by an update request. Items stored in tables may be indexed using a simple or composite primary key.
Abstract:
A system that implements detection and reconciliation of system resource metadata for a distributed storage system is described. A node may obtain resource metadata specific to the node from another node that maintains system resource metadata for a distributed storage system. Based on the resource metadata specific to the node, a determination may be made that the node is not reconciled with the system resource metadata. A corrective operation may be performed to reconcile the node with the system resource metadata. A corrective operation may include terminating a resource, making unavailable a resource, modifying resource attributes, or sending a resource metadata update to system resource metadata for correction.
Abstract:
A system that implements a scalable data storage service may maintain tables in a data store on behalf of storage service clients. The service may maintain table data in multiple replicas of partitions that are stored on respective computing nodes in the system. In response to detecting an anomaly in the system, detecting a change in data volume on a partition or service request traffic directed to a partition, or receiving a service request from a client to split a partition, the data storage service may create additional copies of a partition replica using a physical copy mechanism. The data storage service may issue a split command defined in an API for the data store to divide the original and additional replicas into multiple replica groups, and to configure each replica group to maintain a respective portion of the table data that was stored in the partition before the split.
Abstract:
A system that implements a scalable data storage service may maintain tables in a data store on behalf of storage service clients. The service may maintain table data in multiple replicas of partitions that are stored on respective computing nodes in the system. In response to detecting an anomaly in the system, detecting a change in data volume on a partition or service request traffic directed to a partition, or receiving a service request from a client to split a partition, the data storage service may create additional copies of a partition replica using a physical copy mechanism. The data storage service may issue a split command defined in an API for the data store to divide the original and additional replicas into multiple replica groups, and to configure each replica group to maintain a respective portion of the table data that was stored in the partition before the split.
Abstract:
Techniques are described for managing distributed execution of programs. In some situations, the techniques include dynamically modifying the distributed program execution in various manners, such as based on monitored status information. The dynamic modifying of the distributed program execution may include adding and/or removing computing nodes from a cluster that is executing the program, modifying the amount of computing resources that are available for the distributed program execution, terminating or temporarily suspending execution of the program (e.g., if an insufficient quantity of computing nodes of the cluster are available to perform execution), etc.
Abstract:
Detecting replica faults within a replica group and dynamically scheduling replica healing operations are described. Status metadata for one or more replica groups may be accessed. Based, at least in part, the status data a number of available replicas for at least one replica group may be determined to incompliant with a healthy state definition for the replica group. One or more healing operations to restore the number of available replicas for the at least one replica group to the respective healthy state definition may be dynamically scheduled. In some embodiments, one or more resource constraints for performing healing operations and one or more resource requirements for each of the one or more healing operations may be used to order the one or more healing operations.
Abstract:
Methods and apparatus for configurable-capacity time-series tables are disclosed. A schedule of database table management operations, including at least an operation to change a throughput constraint associated with a table in response to a triggering event, is generated. The table is instantiated with an initial throughput constraint in accordance with the schedule. Work requests directed to the table are accepted based on the initial throughput constraint. The throughput constraint is modified in response to the triggering event. Subsequent work requests are accepted based on the modified throughput constraint.
Abstract:
Techniques are described for managing distributed execution of programs, including by dynamically scaling a cluster of multiple computing nodes performing ongoing distributed execution of a program, such as to increase and/or decrease computing node quantity. An architecture may be used that has core nodes that each participate in a distributed storage system for the distributed program execution, and that has one or more other auxiliary nodes that do not participate in the distributed storage system. Furthermore, as part of performing the dynamic scaling of a cluster, computing nodes that are only temporarily available may be selected and used, such as computing nodes that might be removed from the cluster during the ongoing program execution to be put to other uses and that may also be available for a different fee (e.g., a lower fee) than other computing nodes that are available throughout the ongoing use of the cluster.
Abstract:
A system that implements a scalable data storage service may maintain tables in a data store on behalf of storage service clients. The service may maintain table data in multiple replicas of partitions that are stored on respective computing nodes in the system. In response to detecting an anomaly in the system, detecting a change in data volume on a partition or service request traffic directed to a partition, or receiving a service request from a client to split a partition, the data storage service may create additional copies of a partition replica using a physical copy mechanism. The data storage service may issue a split command defined in an API for the data store to divide the original and additional replicas into multiple replica groups, and to configure each replica group to maintain a respective portion of the table data that was stored in the partition before the split.