Managing deletions from a deduplication database

    公开(公告)号:US11119984B2

    公开(公告)日:2021-09-14

    申请号:US16452309

    申请日:2019-06-25

    Abstract: An information management system can manage the removal of data block entries in a deduplicated data store using working copies of the data block entries residing in a local data store of a secondary storage computing device. The system can use the working copies to identify data blocks for removal. Once the deduplication database is updated with the changes to the working copies (e.g., using a transaction based update scheme), the system can query the deduplication database for the database entries identified for removal. Once identified, the system can remove the database entries identified for pruning and/or the corresponding deduplication data blocks from secondary storage.

    Systems and methods for managing single instancing data

    公开(公告)号:US11016858B2

    公开(公告)日:2021-05-25

    申请号:US14674229

    申请日:2015-03-31

    Abstract: Described in detail herein are systems and methods for managing single instancing data. Using a single instance database and other constructs (e.g. sparse files), data density on archival media (e.g. magnetic tape) is improved, and the number of files per storage operation is reduced. According to one aspect of a method for managing single instancing data, for each storage operation, a chunk folder is created on a storage device that stores single instancing data. The chunk folder contains three files: 1) a file that contains data objects that have been single instanced; 2) a file that contains data objects that have not been eligible for single instancing; and 3) a metadata file used to track the location of data objects within the other files. A second storage operation subsequent to a first storage operation contains references to data objects in the chunk folder created by the first storage operation instead of the data objects themselves.

    LIVE BROWSE CACHE ENHACEMENTS FOR LIVE BROWSING BLOCK-LEVEL BACKUP COPIES OF VIRTUAL MACHINES AND/OR FILE SYSTEMS

    公开(公告)号:US20210064484A1

    公开(公告)日:2021-03-04

    申请号:US16870722

    申请日:2020-05-08

    Abstract: An illustrative approach accelerates live browse operations for block-level backup copies in a data storage management system. A cache storage area is maintained for locally storing and serving key data blocks, thus relying less on retrieving data on demand from backup copies. Live browse operations are used for populating the cache storage area for speedier retrieval during subsequent live browsing and/or file indexing of the same backup copy, and vice versa. The key data blocks cached while file indexing and/or live browsing an earlier backup copy help to pre-fetch corresponding data blocks of later backup copies, thus producing a beneficial learning cycle. The approach is especially beneficial for cloud and tape backup media, and is available for a variety of data sources and backup copies, including block-level backup copies of virtual machines (VMs) and block-level backup copies of file systems, including UNIX-based and Windows-based operating systems and corresponding file systems.

    Restore of secondary data using thread pooling

    公开(公告)号:US10915255B2

    公开(公告)日:2021-02-09

    申请号:US16722756

    申请日:2019-12-20

    Abstract: A system according to certain aspects may include a secondary storage controller computer configured to: in response to a first instruction to obtain a first secondary copy of a first data set from a secondary storage device(s), the first instruction associated with a first restore operation: instantiate a first restore thread on a processor of the secondary storage controller computer; using the first restore thread, retrieve the first secondary copy from the secondary storage device(s); and forward the retrieved first secondary copy to a primary storage subsystem for storage; and in response to a second instruction to obtain a second secondary copy of a second data set from the secondary storage device(s), the second instruction associated with a second restore operation: using the first restore thread, retrieve the second secondary copy from the secondary storage device(s); and forward the retrieved second secondary copy to the primary storage subsystem for storage.

    DATA TRANSFER TECHNIQUES WITHIN DATA STORAGE DEVICES, SUCH AS NETWORK ATTACHED STORAGE PERFORMING DATA MIGRATION

    公开(公告)号:US20200228598A1

    公开(公告)日:2020-07-16

    申请号:US16732262

    申请日:2019-12-31

    Abstract: A stand-alone, network accessible data storage device, such as a filer or NAS device, is capable of transferring data objects based on portions of the data objects. The device transfers portions of files, folders, and other data objects from a data store within the device to external secondary storage based on certain criteria, such as time-based criteria, age-based criteria, and so on. A portion may be one or more blocks of a data object, or one or more chunks of a data object, or other segments that combine to form or store a data object. For example, the device identifies one or more blocks of a data object that satisfy a certain criteria, and migrates the identified blocks to external storage, thereby freeing up storage space within the device. The device may determine that a certain number of blocks of a file have not been modified or called by a file system in a certain time period, and migrate these blocks to secondary storage.

    Efficient deduplication database validation

    公开(公告)号:US10572348B2

    公开(公告)日:2020-02-25

    申请号:US16008591

    申请日:2018-06-14

    Abstract: According to certain aspects, a method can include receiving an indication that a restoration of a deduplication database using a secondary copy of a file associated with a secondary copy job is complete; retrieving a first data fingerprint from a data storage database, wherein the first data fingerprint is associated with the secondary copy job used to restore the deduplication database; retrieving a second data fingerprint from a deduplication database media agent, wherein the second data fingerprint is associated with the secondary copy job used to restore the deduplication database; comparing the first data fingerprint with the second data fingerprint to determine whether the first data fingerprint and the second data fingerprint match; and transmitting an instruction to the deduplication database media agent to rebuild the restored deduplication database in response to a determination that the first data fingerprint and the second data fingerprint do not match.

Patent Agency Ranking