Abstract:
A network-attachable data transfer device housed within a shippable enclosure that incorporates an updateable electronic display for displaying shipping destination information is disclosed. The device may be initialized (e.g., prepared to receive data, and the updateable electronic shipping display set to the shipping destination) by a service provider and shipped, in accordance with the displayed destination address, as a self-contained shipping unit. The device may be installed onto a network at the destination and loaded with data. The display may also be updated with the next destination address such that the device is shipped to the updated destination address (e.g., back to the service provider, or onto other destinations before being send back to the service provider). When the device is received back at the service provider, the data is transferred from the device to a service provider storage facility, wiped of data, and prepared to be sent out again.
Abstract:
Techniques described and suggested herein include systems and methods for optimizing retrieval, based on localities associated with a requestor and that of various components of a data storage system, of data archives stored on data storage systems using redundancy coding techniques. For example, redundancy coded shards, which may include identity shards that contain unencoded original data of archives, may be configured such that a variable number of the shards can be leveraged to meet performance requirements or time-to-retrieval limitations for retrieval requests associated with the archives stored and/or encoded therein. Under some circumstances, implementing systems may monitor relative geographic locations, among other performance-related metrics, so as to retrieve data such that fewer hosting data storage facilities are used for a given retrieval.
Abstract:
Techniques for encoding data storage systems using grid encoded data storage systems are described herein. Data to be stored in a data storage system is obtained and the data is stored in a grid of shards using grid encoding techniques that store the data in a combination of data shards and derived shards. Each of the shards has at least a first index corresponding to one dimension of the grid and a second index corresponding to a second dimension of the grid. Loss of a plurality of data shards can be repaired because each shard is reproducible from one or more shards with a first index that is associated with the first index of the shard and is also reproducible from one or more shards with a second index that is associated with the second index of the shard.
Abstract:
In response to receiving a request from a client to store an object, a key-durable storage system may assign the object to a volume in its data store, generate a key for the object (e.g., an opaque identifier that encodes information for locating the object in the data store), store the object on one disk in the assigned volume, store the key redundantly in the assigned volume (e.g., using a replication or erasure coding technique), and may return the key to the client. To retrieve the object, the client may send a request including the key, and the system may return the object to the client. If a disk fails, the system may determine which objects were lost, and may return the corresponding keys to the appropriate clients in a notification. The system may be used to back up a more expensive object-redundant storage system.
Abstract:
Techniques described and suggested herein include systems and methods for precomputing regeneration information for data archives (“archives”) that have been processed and stored using redundancy coding techniques. For example, regeneration information, such as redundancy code-related matrices (such as inverted matrices based on, e.g., a generator matrix for the selected redundancy code) corresponding to subsets of the shards, is computed for each subset and, in some embodiments, stored for use in the event that one or more shards becomes unavailable, e.g., so as to more efficiently and/or quickly regenerate a replacement shard.
Abstract:
Techniques for extending a grid encoded data storage system to additional datacenters are described herein. A grid of shards with a first index and a second index is created and a set of null shards is added to the grid of shards. When a data object is received for storage in the grid of shards, a set of shards with the same first index is selected for the storage location with at least one null shard and one or more other shards. The null shard is enabled for data storage by allocating a storage device for the null shard. The grid is then updated by storing at least a portion of the data object in the set of shards, updating derived shards in the set of shards, and updating derived shards with the same second index as the updated shards.
Abstract:
A switching device is implemented in a network-attachable data transfer device to provide data storage access to other such devices. In some embodiments, network-attachable data transfer devices are arranged in a clustered configuration to provide various computational and storage services. When one or more devices of the cluster fails, various implementations associated with the switching device, via an external data interface, provide operational mitigation, optimized data recovery, and efficient reinstatement of normal operation of the cluster.
Abstract:
Techniques and methods for generating and implementing multiple layers of redundancy coded data are disclosed. For example, a redundancy coding scheme may include data elements that include data that is unencoded relative to the input, yet may still fully participate in providing redundancy to any data element in a given set. In a layered scheme, the input may include a bundle or group of encoded (or unencoded) data elements, thereby nesting two or more layers of redundancy coding. The specific amount of redundancy generated by such a scheme may be adjusted and adapted to failure characteristics of the entity on which the data elements are stored.
Abstract:
A system for storing data includes a rack, one or more data storage drive assemblies coupled to the rack, and a data control module coupled to the rack. The data storage drive assemblies include one or more drive mechanical modules configured to store data and one or more drive control modules coupled to the drive mechanical modules. The drive control modules control mechanical operations in the drive mechanical modules. The drive mechanical modules and the associated drive control modules are separable from one another without removing the other module from the at least one data storage drive assembly.
Abstract:
Ranges of data stored within archived data may be retrieved according to a predefined hash tree schema. A retrieval request for a range of one or more data chunks of an archived data object stored in archival data store may be retrieved. In response, the requested range of the archived data object may be determined to be tree-hash aligned. In response to determining that the requested range is tree-hash aligned, a retrieval job may be initiated to obtain the range of one or more data chunks and to stage the one or more data chunks for download. A download request may for one or more of the obtained and staged data chunks, and if determined to be tree-hash aligned, a tree hash root node may be sent to the requesting client in addition to the requested data.