Abstract:
Techniques described and suggested herein include systems and methods for precomputing regeneration information for data archives (“archives”) that have been processed and stored using redundancy coding techniques. For example, regeneration information, such as redundancy code-related matrices (such as inverted matrices based on, e.g., a generator matrix for the selected redundancy code) corresponding to subsets of the shards, is computed for each subset and, in some embodiments, stored for use in the event that one or more shards becomes unavailable, e.g., so as to more efficiently and/or quickly regenerate a replacement shard.
Abstract:
Erasure encoded fragments are originally generated by applying an erasure encoding scheme to a data file. An erasure encoded fragment is subsequently generated directly from previously generated erasure encoded fragments or by reconstructing the original data file and then erasure encoding the reconstructed data file. The integrity or fidelity of such a subsequently generated erasure encoded fragment is verified by newly generating an error detection code, such as but not limited to a checksum, for the subsequently generated erasure encoded fragment, and comparing that subsequently error detection code against an error detection code previously generated for a previous or original version of the erasure encoded fragment. Each error detection code is preferably stored in association with its corresponding erasure encoded fragment and with one or more other erasure encoded fragments. Thus, each error detection code is saved in at least two locations.
Abstract:
Techniques described and suggested herein include systems and methods for precomputing regeneration information for data archives (“archives”) that have been processed and stored using redundancy coding techniques. For example, regeneration information, such as redundancy code-related matrices (such as inverted matrices based on, e.g., a generator matrix for the selected redundancy code) corresponding to subsets of the shards, is computed for each subset and, in some embodiments, stored for use in the event that one or more shards becomes unavailable, e.g., so as to more efficiently and/or quickly regenerate a replacement shard.
Abstract:
If none of the fragments of an erasure-coded data file have become corrupted then the original data file can be readily reconstructed. If one or more fragments of an erasure-coded data file have become corrupted it may still be possible to find a combination of fragments that reconstruct the original data file, but the number of possible combinations may be impracticably large. If an attempt with a first set of fragments fails, an efficient approach is to use an independent set of fragments for the second attempt. Then, for further attempts, the results of a current attempt are compared with previous results. If a match is found then the original data file has been reconstructed. An original data file may also be reconstructed by separately recovering each data block of the data file from corresponding fragment blocks and assembling the data file from the recovered data blocks.