Abstract:
Techniques for making storage of data objects eventually durable using redundancy encoding are described herein. Data objects are stored in a first set of data storage devices with a first durability. After a predetermined length of time, the data objects are converted to data shards and distributed to a second set of data storage devices with a second durability that is distinct from the first durability.
Abstract:
Erasure encoded fragments are generated by an erasure encoding scheme, represented by an erasure encoding matrix, operating on a data file. A new erasure encoded fragment may be generated from previously-generated erasure encoded fragments without reconstructing the original data file. Available and valid erasure encoded fragments are identified and a set of those fragments is selected. A composite encoding matrix is generated based upon the selected fragments and the fragment specified to be generated. The composite matrix is applied to the selected fragments to produce a plurality of partial sums. The partial sums are then combined to generate the specified fragment. The partial sums may be produced by different devices so as to distribute the computational workload and/or to reduce network traffic. The integrity of a generated fragment may be verified by generating the specified fragment twice, using two different sets of fragments, and then comparing the two results.
Abstract:
Techniques for incrementally increasing media size in data storage systems using grid encoded data storage techniques are described herein. A grid of shards is created where each shard of the grid of shards has a first index, a second index and each shard also has an associated storage device configured with a storage capacity that is large enough to store the largest set of data on a shard. Upon determining to replace the storage devices of the grid with storage devices that have a different storage capacity, the storage devices can be incrementally replaced within the grid by first padding each shard of the grid of shards with a set of data values, replacing a data shard storage device with a device of the different storage capacity, and replacing a set of derived shard storage devices with devices of the different storage capacity.
Abstract:
A data storage service receives a request to perform an operation in a data storage system that consists of many data storage devices, each device having a corresponding set of devices that may cause interference. The data storage service determines a manner in which to perform the operation while evaluating the current activity state of the devices that may cause interference. The data storage service can perform the operation in the determined manner.
Abstract:
A data storage system includes one or more hard disk drive systems and an air moving device. The hard disk drive systems may include one or more drive mechanical modules that store data and a drive control module electrically coupled to the drive mechanical modules. The drive control modules may control mechanical operations in the drive mechanical modules. The drive control module includes a circuit board assembly and heat producing components coupled to the circuit board assembly. Air passages on each side of the printed circuit board assembly allow a stream of air to flow across heat producing components on at least one side of the printed circuit board assembly.
Abstract:
A system stores data, such as sensor data or other operational data, on a plurality of storage volumes in a sequence so as to allow for interpolations or other approximations of the data using a subset of the storage volumes in response to a request for information regarding that data. For example, a plurality of devices connect to the system to provide operational data, which is then stored in a specified sequence on a specified set of volumes. In response to a request for operational information regarding some or all of the devices, the system reads at least one of the volumes, and approximates the values of the data over a specified period of time. In some embodiments, the data may be buffered prior to storage, and a jitter analyzer determines whether the incoming data is anomalous relative to a baseline, which may be determined using related data sets.
Abstract:
A hub device or edge device implements adaptive data compression. The model training service analyzes the received data. A machine learning model of the hub device receives time-series data from one or more data sources and classifies respective portions of the time-series data as respective patterns. A data compressor at the hub device generates compressed data by applying different compression techniques to the respective portions of the time-series data according to a mapping of the compression techniques to the respective patterns. The hub device then transmits the compressed data to an endpoint for processing (e.g., another device that uses the compressed data). The hub device receives feedback for the compressed data. In response to the feedback, the hub device changes one or more of the compression techniques that are mapped to the respective patterns.
Abstract:
Techniques for encrypting data using a randomly selected data block from a set of data are described herein. An index indicates a subset of data within a data object. The data block is selected based at least in part on the index, an input to a cryptographic operation is generated from the data block, and the input to the cryptographic operation is provided to the cryptographic operation.
Abstract:
Techniques and methods for generating and implementing multiple layers of redundancy coded data are disclosed. For example, a redundancy coding scheme may include data elements that include data that is unencoded relative to the input, yet may still fully participate in providing redundancy to any data element in a given set. In a layered scheme, the input may include a bundle or group of encoded (or unencoded) data elements, thereby nesting two or more layers of redundancy coding. The specific amount of redundancy generated by such a scheme may be adjusted and adapted to failure characteristics of the entity on which the data elements are stored.
Abstract:
A data storage service stores a dataset on a set of storage nodes in accordance with a first encoding. A set of shards constituting quorum, and one or more additional shards, are stored on the storage nodes. The data storage system determines to store the dataset according to a second encoding, in which the second encoding has a greater number of shards. The data storage system reconfigures the storage of the dataset in accordance with the second encoding, such that the reconfigured storage forms additional shards for the second encoding by combining portions of shards of the first encoding.