Abstract:
A method and system for reducing storage requirements and speeding up storage operations by reducing the storage of redundant data includes receiving a request that identifies one or more data objects to which to apply a storage operation. For each data object, the storage system determines if the data object contains data that matches another data object to which the storage operation was previously applied. If the data objects do not match, then the storage system performs the storage operation in a usual manner. However, if the data objects do match, then the storage system may avoid performing the storage operation.
Abstract:
Techniques for enabling user search of content stored in a file archive include providing a search interface comprising a search rules portion and an action rules portion, receiving a file archive search criterion comprising at least one search rule, and searching the file archive using the search criterion. The techniques also include generating a set of files filtered using the search criterion and performing an action specified in the action rules portion on a file included in the set of files.
Abstract:
A content alignment system according to certain embodiments aligns a sliding window at the beginning of a data segment. The content alignment system performs a block alignment function on the data within the sliding window. A deduplication block is established if the output of the block alignment function meets a predetermined criteria. At least part of a gap is established if the output of the block alignment function does not meet the predetermined criteria. The predetermined criteria is changed if a threshold number of outputs fail to meet the predetermined criteria.
Abstract:
Described in detail herein are systems and methods for single instancing blocks of data in a data storage system. For example, the data storage system may include multiple computing devices (e.g., client computing devices) that store primary data. The data storage system may also include a secondary storage computing device, a single instance database, and one or more storage devices that store copies of the primary data (e.g., secondary copies, tertiary copies, etc.). The secondary storage computing device receives blocks of data from the computing devices and accesses the single instance database to determine whether the blocks of data are unique (meaning that no instances of the blocks of data are stored on the storage devices). If a block of data is unique, the single instance database stores it on a storage device. If not, the secondary storage computing device can avoid storing the block of data on the storage devices.
Abstract:
Techniques for enabling user search of content stored in a file archive include providing a search interface comprising a search rules portion and an action rules portion, receiving a file archive search criterion comprising at least one search rule, and searching the file archive using the search criterion. The techniques also include generating a set of files filtered using the search criterion and performing an action specified in the action rules portion on a file included in the set of files.
Abstract:
A method and system for providing information management of data from hosted services receives information management policies for a hosted account of a hosted service, requests data associated with the hosted account from the hosted service, receives data associated with the hosted account from the hosted service, and provides a preview version of the received data to a computing device. In some examples, the system indexes the received data to associate the received data with a user of an information management system, and/or provides index information related to the received data to the computing device.
Abstract:
A method and system for providing information management of data from hosted services receives information management policies for a hosted account of a hosted service, requests data associated with the hosted account from the hosted service, receives data associated with the hosted account from the hosted service, and provides a preview version of the received data to a computing device. In some examples, the system indexes the received data to associate the received data with a user of an information management system, and/or provides index information related to the received data to the computing device.
Abstract:
The data storage system according to certain aspects can implement partial file restore, where only a portion of the secondary copy of a file is restored. Such portion may be designated by one or more application offsets for the file. The system may provide an in-chunk index that includes mapping information between the application offsets and the secondary copy offsets. Chunks may refer to logical data units in which secondary copies are stored, and the in-chunk index for a chunk may be stored in secondary storage with the chunk. Because the mapping information may not be provided at a fixed interval, the system can search through application offsets in the in-chunk index to locate the secondary copy offset corresponding to the portion application offset(s). In this manner, the system may restore the designated portion of the secondary copy in a fast and efficient manner by using the in-chunk index.
Abstract:
Described are systems and methods for storing a variable number of instances of data objects (e.g., 1, 2, 3, or up to N−1 instances, where N is the number of instances of the data object included in primary data) in secondary storage across a data storage network. In some examples, a system for storing a variable number of instances of data objects includes, one or more computing devices storing a set of data objects and multiple storage devices distinct from the one or more computing devices. Each of the multiple storage devices is configured to store at least a single instance of a data object. The system also includes a database configured to store information associated with the data objects. This information includes substantially unique identifiers for the data objects and, for each of the data objects, a number of instances of the data object stored on the multiple storage devices.
Abstract:
According to certain aspects, a method can include receiving an indication that a restoration of a deduplication database using a secondary copy of a file associated with a secondary copy job is complete; retrieving a first data fingerprint from a data storage database, wherein the first data fingerprint is associated with the secondary copy job used to restore the deduplication database; retrieving a second data fingerprint from a deduplication database media agent, wherein the second data fingerprint is associated with the secondary copy job used to restore the deduplication database; comparing the first data fingerprint with the second data fingerprint to determine whether the first data fingerprint and the second data fingerprint match; and transmitting an instruction to the deduplication database media agent to rebuild the restored deduplication database in response to a determination that the first data fingerprint and the second data fingerprint do not match.