Abstract:
Techniques described and suggested herein include systems and methods for storing, indexing, and retrieving original data of data archives on data storage systems using redundancy coding techniques. For example, redundancy codes, such as erasure codes, may be applied to archives (such as those received from a customer of a computing resource service provider) so as allow the storage of original data of the individual archives available on a minimum of volumes, such as those of a data storage system, while retaining availability, durability, and other guarantees imparted by the application of the redundancy code. Sparse indexing techniques may be implemented so as to reduce the footprint of indexes used to locate the original data, once stored. The volumes may be apportioned into failure-decorrelated subsets, and archives stored thereto may be apportioned to such subsets.
Abstract:
Techniques described and suggested herein include various methods and systems for verifying integrity of redundancy coded data, such as erasure coded data shards. In some embodiments, a quantity of redundancy coded data elements, hereafter referred to as data shards (e.g., erasure coded data shards), sufficient to reconstruct the original data element from which the redundancy coded data elements are derived, is used to generate reconstructed data shards to be used for checking the validity of analogous data shards stored for the original data element.
Abstract:
Techniques for producing incremental short-term backups while minimizing media access are described herein. A backup request is received that specifies data to backup and a schedule for that backup. The data is then partitioned based on the schedule and, for each of the partitions, it is determined whether to store a full or incremental backup of that partition. Each partition is fully backed up once during a cycle of backups and incrementally backed up at other times. With each full backup of a partition, a reverse delta that can be used to reconstruct the previous full backup for that partition is stored with the full backup.
Abstract:
Techniques for incrementally increasing media size in data storage systems using grid encoded data storage techniques are described herein. A grid of shards is created where each shard of the grid of shards has a first index, a second index and each shard also has an associated storage device configured with a storage capacity that is large enough to store the largest set of data on a shard. Upon determining to replace the storage devices of the grid with storage devices that have a different storage capacity, the storage devices can be incrementally replaced within the grid by first padding each shard of the grid of shards with a set of data values, replacing a data shard storage device with a device of the different storage capacity, and replacing a set of derived shard storage devices with devices of the different storage capacity.
Abstract:
Techniques described and suggested herein include systems and methods for storing, indexing, and retrieving original data of data archives on data storage systems using redundancy coding techniques. For example, redundancy codes, such as erasure codes, may be applied to archives (such as those received from a customer of a computing resource service provider) so as allow the storage of original data of the individual archives available on a minimum of volumes, such as those of a data storage system, while retaining availability, durability, and other guarantees imparted by the application of the redundancy code. Sparse indexing techniques may be implemented so as to reduce the footprint of indexes used to locate the original data, once stored. The volumes may be apportioned into failure-decorrelated subsets, and archives stored thereto may be apportioned to such subsets.
Abstract:
Techniques described and suggested herein include systems and methods for precomputing regeneration information for data archives (“archives”) that have been processed and stored using redundancy coding techniques. For example, regeneration information, such as redundancy code-related matrices (such as inverted matrices based on, e.g., a generator matrix for the selected redundancy code) corresponding to subsets of the shards, is computed for each subset and, in some embodiments, stored for use in the event that one or more shards becomes unavailable, e.g., so as to more efficiently and/or quickly regenerate a replacement shard.
Abstract:
Techniques described and suggested herein include systems and methods for precomputing regeneration information for data archives (“archives”) that have been processed and stored using redundancy coding techniques. For example, regeneration information, such as redundancy code-related matrices (such as inverted matrices based on, e.g., a generator matrix for the selected redundancy code) corresponding to subsets of the shards, is computed for each subset and, in some embodiments, stored for use in the event that one or more shards becomes unavailable, e.g., so as to more efficiently and/or quickly regenerate a replacement shard.
Abstract:
If none of the fragments of an erasure-coded data file have become corrupted then the original data file can be readily reconstructed. If one or more fragments of an erasure-coded data file have become corrupted it may still be possible to find a combination of fragments that reconstruct the original data file, but the number of possible combinations may be impracticably large. If an attempt with a first set of fragments fails, an efficient approach is to use an independent set of fragments for the second attempt. Then, for further attempts, the results of a current attempt are compared with previous results. If a match is found then the original data file has been reconstructed. An original data file may also be reconstructed by separately recovering each data block of the data file from corresponding fragment blocks and assembling the data file from the recovered data blocks.
Abstract:
Erasure encoded fragments are generated by an erasure encoding scheme, represented by an erasure encoding matrix, operating on a data file. A new erasure encoded fragment may be generated from previously-generated erasure encoded fragments without reconstructing the original data file. Available and valid erasure encoded fragments are identified and a set of those fragments is selected. A composite encoding matrix is generated based upon the selected fragments and the fragment specified to be generated. The composite matrix is applied to the selected fragments to produce a plurality of partial sums. The partial sums are then combined to generate the specified fragment. The partial sums may be produced by different devices so as to distribute the computational workload and/or to reduce network traffic. The integrity of a generated fragment may be verified by generating the specified fragment twice, using two different sets of fragments, and then comparing the two results.
Abstract:
A data center may include a tape library rack module along with rack computer systems. The rack computer systems may be configured to provide computing capacity within a data center environment. In some embodiments, the tape library rack module may include an enclosure encompassing an interior of the tape library rack module, a rack within the interior, and a tape library unit mounted on the rack. The tape library rack unit may include tape cartridges configured to store data within a tape environment that is different than the data center environment. The tape library rack unit may be within a portion of the interior that is enclosed such that it is environmentally isolated from the data center environment. In some examples, the tape library rack module may include a cooling unit and/or a humidifier unit, which may provide the tape environment to the environmentally isolated portion of the interior of the tape library rack module.