Abstract:
A method and system for reducing storage requirements and speeding up storage operations by reducing the storage of redundant data includes receiving a request that identifies one or more data objects to which to apply a storage operation. For each data object, the storage system determines if the data object contains data that matches another data object to which the storage operation was previously applied. If the data objects do not match, then the storage system performs the storage operation in a usual manner. However, if the data objects do match, then the storage system may avoid performing the storage operation.
Abstract:
An information management system provides a data deduplication system that uses a primary table, a deduplication chunk table, and a chunk integrity table to ensure that a referenced deduplicated data block is only verified once during the data verification of a backup or other replication operation. The data deduplication system may reduce the computational and storage overhead associated with traditional data verification processes. The primary table, the deduplication chunk table, and the chunk integrity table, all of which are stored in a deduplication database, can also ensure synchronization between the deduplication database and secondary storage devices.
Abstract:
The library server according to certain aspects can manage the use of tape drives according to the data requirements of different storage operation cells. The library server according to certain aspects can also facilitate automatic management of tape media in a tape library by allocating the tapes and slots to different cells. For instance, the library server can manage the positioning and placement of the tapes into appropriate slots within the tape library.
Abstract:
The data storage system according to certain aspects can implement partial file restore, where only a portion of the secondary copy of a file is restored. Such portion may be designated by one or more application offsets for the file. The system may provide an in-chunk index that includes mapping information between the application offsets and the secondary copy offsets. Chunks may refer to logical data units in which secondary copies are stored, and the in-chunk index for a chunk may be stored in secondary storage with the chunk. Because the mapping information may not be provided at a fixed interval, the system can search through application offsets in the in-chunk index to locate the secondary copy offset corresponding to the portion application offset(s). In this manner, the system may restore the designated portion of the secondary copy in a fast and efficient manner by using the in-chunk index.
Abstract:
An information management system according to certain aspects uses backup copies or other secondary copies of production data for the purposes of replicating production data to another client. The secondary copies can be deduplicated copies. By utilizing available secondary copies of the data for replication, the system can reduce the impact on the production machines associated with replication. Utilizing deduplicated copies not only reduces the amount of stored data, but also reduces the amount of data that is communicated between the source and the destination, increasing the speed of the replication process.
Abstract:
The disclosed techniques that can use deduplication information on a source computer platform to improve the process of performing data backups or restoration from/to the computer platform. In one example aspect, a data backup operation can re-use some of the work already done by a source computer's deduplication system. For example, a storage operation could read a deduplication database on the source computer platform to determine the duplicativeness of a given data chunk being transferred to a backup storage system, without having to perform computations such as data chunk hashing and comparison with previously generated hashes. The technique may additionally or alternatively reuse hashes generated by the source computer during deduplication of the data file on the source computer's file system during deduplication at the external backup storage system.
Abstract:
Data storage operations, including content-indexing, containerized deduplication, and policy-driven storage, are performed within a cloud environment. The systems support a variety of clients and cloud storage sites that may connect to the system in a cloud environment that requires data transfer over wide area networks, such as the Internet, which may have appreciable latency and/or packet loss, using various network protocols, including HTTP and FTP. Methods are disclosed for content indexing data stored within a cloud environment to facilitate later searching, including collaborative searching. Methods are also disclosed for performing containerized deduplication to reduce the strain on a system namespace, effectuate cost savings, etc. Methods are disclosed for identifying suitable storage locations, including suitable cloud storage sites, for data files subject to a storage policy. Further, systems and methods for providing a cloud gateway and a scalable data object store within a cloud environment are disclosed, along with other features.
Abstract:
The disclosed techniques include generation of a single index table when backing up data in a first backup format to a backup storage system that uses a second backup format. Using the single index table, a query for a data item can be answered by searching the single index table. The single index table avoids having to search through multiple index tables, each corresponding to a different backup format that may be used for backing up the searched data item.
Abstract:
An information management system according to certain aspects may determine whether snapshot operations will work prior to executing them. The system may check various factors or parameters relating to a snapshot storage policy to verify whether the storage policy will work at runtime without actually executing the policy. Some examples of factors can include: availability of primary storage devices for which a snapshot should be obtained, availability of secondary storage devices, license availability for snapshot software, user credentials for connecting to primary and/or second storage devices, available storage capacity, connectivity to storage devices, etc. The system may also check whether a particular system configuration is supported in connection with snapshot operations. The result of the determination can be provided in the form of a report summarizing any problems found with the snapshot storage policy. The report can include recommended courses of action or solutions for resolving any identified issues.
Abstract:
According to certain aspects, a method can include receiving, in response to an indication that a data storage database is being restored to a second time before a first time such that the data storage database comprises a plurality of first archive file identifiers associated at the second time, a first instruction from a data storage computer, where the first instruction instructs a media agent to stop scheduled secondary storage operations associated with a deduplication database, and where the deduplication database comprises a plurality of second archive file identifiers; determining at least one second archive file identifier in the plurality of second archive file identifiers that does not correlate with any first archive identifier in the plurality of first archive file identifiers; and, for each of the at least one second archive identifier, instructing the deduplication database to prune an entry associated with the respective second archive file identifier.