Abstract:
Data de-duplication schemes reduce the amount of storage necessary to store a data set by dividing the data into segments and storing a segment identifier on a storage medium in place of each data segment. Each unique data segment is stored in a repository, and duplicate data segments are not stored. Methods and apparatus are provided for distributing data segments across multiple repositories in a data storage system, thereby reducing the quantity of data stored at a particular repository. Segments are assigned to repositories based upon a characteristic of the segments. The characteristic may be the length of the segment or some other value produced by a repeatable, uniformly-distributed function of the segment. The characteristic may be stored on the storage medium along with the segment identifier. The original data may be regenerated by retrieving the segment identifiers and characteristics from the storage medium and retrieving each segment from the repository identified by the characteristic.
Abstract:
Systems and associated methods provide a level of indirection between multiple host computers and multiple data storage resources, such as removable media data storage devices. At least one of the hosts is not provided with direct access to some of the removable media data storage devices. Instead, logical addresses are provided to the host computers, where the logical addresses can relate to physical addresses associated with the data storage resources. A data storage resource handle or logical proxy may be presented to a host, and a management layer determines whether the host receives access to physical data storage resources, or virtual resources that emulate the physical resources.
Abstract:
A system and method for use in an automated data storage cartridge library defines cartridges for use with an external host computer (“open” cartridges), and cartridges for use only internal to the library (“closed” cartridges). Cartridges may be “virtualized” by storing data from them on disk or closed cartridges, and then “realized” by writing data to physical cartridges. Virtual cartridges may be logically exported from one library to another. When new cartridges are introduced to the library, they may be designated with one of multiple designations or uses.
Abstract:
Example methods, and apparatus concern file repair. One example method includes storing a file in a file store and also parsing the file into a set of constituent data blocks. The method includes selectively storing, in a data store, unique data blocks from the set of constituent data blocks. The method includes maintaining, in a combination of the file store and the data store, a threshold number of copies of data blocks. The method also includes maintaining a data structure that stores data for locating the file in the file store and that stores data for recreating the file from data blocks. The method also includes maintaining a data structure that stores data for locating multiple copies of data found in members of the set of constituent data blocks. Files can be repaired using data blocks parsed from stored files or using data blocks stored as data blocks.
Abstract:
An example method includes controlling a data de-duplication apparatus to arrange a de-duplication schedule based on the presence or absence of a replication indicator in an item to be de-duplicated. The method also includes selectively controlling the de-duplication schedule based on a replication priority. In one embodiment, the method includes, upon determining that a chunk of data is associated with a replication indicator, controlling the data de-duplication apparatus to schedule the chunk for de-duplication ahead of chunks not associated with a replication indicator. In one embodiment, the method also includes, upon determining that the chunk is associated with a replication priority, controlling the data de-duplication apparatus to schedule the chunk for de-duplication ahead of chunks of data not associated with a replication priority. The schedule location is based, at least in part, on the replication priority. The method also includes controlling de-duplication order based on the schedule.
Abstract:
Systems and associated methods provide a level of indirection between multiple host computers and multiple data storage resources, such as removable media data storage devices. At least one of the hosts is not provided with direct access to some of the removable media data storage devices. Instead, logical addresses are provided to the host computers, where the logical addresses can relate to physical addresses associated with the data storage resources. A data storage resource handle or logical proxy may be presented to a host, and a management layer determines whether the host receives access to physical data storage resources, or virtual resources that emulate the physical resources.
Abstract:
Example apparatus and methods concern no touch synthetic full backups where a new backup is created using information about previous backups but without reading data from the existing backups. The no touch synthetic backup can be created by correlating file system information, backup specification information, and dedupe system information. One example method includes accessing a set of target extents associated with a synthetic backup image overlay specification and accessing a set of source extents associated with a file stored in a previous backup image. The set of source extents are selected so that they can provide data sufficient to cover the data described in the set of target extents. The method includes creating a set of correlation extents that bridge the gap between the original specification and the final specification.
Abstract:
A computer-implemented method for deduplicating an incoming data sequence can include the steps of storing signature values for a plurality of data blocklets of a parent data sequence in a deduplication index, sequentially storing signature values for at least some of the plurality of data blocklets of the parent data sequence in a first storage location outside of the deduplication index, determining that a first data blocklet in the incoming data sequence is absent from the parent data sequence, storing a signature value for the first data blocklet in a second storage location outside of the deduplication index, storing a guarded link linking the first data blocklet to the second data blocklet into the second storage location, determining that a second data blocklet that follows the first data blocklet in the incoming data sequence is present in the parent data sequence, the second data blocklet having a signature value that is stored in the first storage location, and copying at least a portion of the contents of the first storage location and the second storage location into a cache to expedite access during deduplication of the incoming data sequence.
Abstract:
An example method includes controlling a data de-duplication apparatus to arrange a de-duplication schedule based on the presence or absence of a replication indicator in an item to be de-duplicated. The method also includes selectively controlling the de-duplication schedule based on a replication priority. In one embodiment, the method includes, upon determining that a chunk of data is associated with a replication indicator, controlling the data de-duplication apparatus to schedule the chunk for de-duplication ahead of chunks not associated with a replication indicator. In one embodiment, the method also includes, upon determining that the chunk is associated with a replication priority, controlling the data de-duplication apparatus to schedule the chunk for de-duplication ahead of chunks of data not associated with a replication priority. The schedule location is based, at least in part, on the replication priority. The method also includes controlling de-duplication order based on the schedule.
Abstract:
A system and method for use in an automated data storage cartridge library defines cartridges for use with an external host computer (“open” cartridges), and cartridges for use only internal to the library (“closed” cartridges). Cartridges may be “virtualized” by storing data from them on disk or closed cartridges, and then “realized” by writing data to physical cartridges. Virtual cartridges may be logically exported from one library to another. When new cartridges are introduced to the library, they may be designated with one of multiple designations or uses.