Abstract:
Providers of web services and other types of software as a service may be subject to service-level agreements requiring that response times be within a defined range. For efficiency, multiple services may be hosted on the same set of computing nodes, which may jeopardize adherence to service-level agreements. A control system may involve classifying service requests and determining desired values for measurements such as latency. An error value may be calculated based on the difference between measured and desired values. A controller may adjust a rate of capacity utilization for the computing nodes based on the current error, a history of past errors, and a prediction of future errors.
Abstract:
Implementations detailed herein include description of a computer-implemented method to migrate a machine learning model from one accelerator portion (such as a portion of a graphical processor unit (GPU)) to a different accelerator portion. In some instances, a state of the first accelerator portion is persisted, the second accelerator portion is configured, the first accelerator portion is then detached from a client application instance, and at least a portion of an inference request is performed using the loaded at least a portion of the machine learning model on the second accelerator portion that had been configured.
Abstract:
A repository of key-value data may store a first object value having an internal structure of a hierarchy of sub-objects. The repository may receive a request to modify the first object, expressed as a projection of locations in the object to be updated and a function that, upon evaluation, returns values to be used to update the projected locations of the object. The repository may determine that the locations specified by the projections correspond to non-overlapping regions of the object and, based on the determination, update the object using the results of evaluating the function.
Abstract:
One or more table partitions may communicate with an index partition that may be a master of a replication group. A communications channel may exist between table partitions and the index partition. Upon splitting the index partition, communications between the table partitions and the index partition may be suspended. Upon completion of the split, communications may be reestablished between the table partitions and a partition, of the replication group of index partitions, designated to be a master following the split. Messages accumulated by the table partitions during the split may be sent to the index partition upon reestablishing communications.
Abstract:
A distributed database management system may comprise a plurality of computing nodes. A request to update an item maintained by the system may be acknowledged as durable and committed once an entry corresponding to the request has been written to a log file and quorum among the computing nodes has been achieved. Improved consistency may be achieved by maintaining snapshots of committed item states within queryable in-memory snapshot data structures. Range queries may be performed by merging a secondary index with the snapshots and applying filters. Projections may be completed by retrieving additional data from an item collection maintain on one or more storage devices.
Abstract:
A distributed database management system may comprise a plurality of computing nodes. A request to update an item maintained by the system may be acknowledged as durable and committed once an entry corresponding to the request has been written to a log file and quorum among the computing nodes has been achieved. Improved consistency may be achieved by maintaining snapshots of committed item states within queryable in-memory snapshot data structures. Range queries may be performed by merging a secondary index with the snapshots and applying filters. Projections may be completed by retrieving additional data from an item collection maintain on one or more storage devices.
Abstract:
A data storage system may implement implicit prioritization to rate-limit secondary index creation for an online table. A secondary index may be generated for a table stored in a data store. The table may be incrementally indexed, performing multiple indexing operations to populate the secondary index. Prior to performing an indexing operation, an evaluation of a capacity limitation for performing indexing operations may be made with respect to capacity to process access requests at the data store. If a determination is made that performance of the indexing operation exceeds the capacity limitation, then the indexing operation may be throttled. If a determination is made that performance of the indexing operation does not exceed the capacity limitation, then the indexing operation may be performed.
Abstract:
A system that implements a data storage service may store data for a database table in multiple replicated partitions on respective storage nodes. In response to a request to back up a table, the service may back up individual partitions of the table to a remote storage system independently and (in some cases) in parallel, and may update (or create) and store metadata about the table and its partitions on storage nodes of the data storage service and/or in the remote storage system. Backing up each partition may include exporting it from the database in which the table is stored, packaging and compressing the exported partition for upload, and uploading the exported, packaged, and compressed partition to the remote storage system. The remote storage system may be a key-value durable storage system in which each backed-up partition is accessible using its partition identifier as the key.
Abstract:
Methods and apparatus for equitable distribution of excess shared-resource throughput capacity are disclosed. A first and a second work target are configured to access a shared resource to implement accepted work requests. Admission control is managed at the work targets using respective token buckets. A first metric indicative of the work request arrival rates at the work targets during a time interval, and a second metric associated with the provisioned capacities of the work targets are determined. A number of tokens determined based on a throughput limit of the shared resource is distributed among the work targets to be used for admission control during a subsequent time interval. The number of tokens distributed to each work target is based on the first metric and/or the second metric.
Abstract:
A database service may maintain tables on behalf of clients and may provision throughput capacity for those tables. A table may be divided into multiple partitions, according to hash of the primary key values for each of the items in the table, and the items in the table may be accessed using the hash of their primary key values. Provisioned throughput capacity for the table may be divided between the partitions and used in servicing requests directed to items in the table. The service (or underlying system) may provide mechanisms for generating skew-related metrics or reports and presenting them to clients via a graphical user interface (GUI). The metrics and reports may indicate the amount of uniformity or skew in the distribution of requests across the key space for the table using histograms, heat maps, or other representations. Clients may initiate actions to correct any skewing via the GUI.