Deduplication database without reference counting

    公开(公告)号:US12007967B2

    公开(公告)日:2024-06-11

    申请号:US17725451

    申请日:2022-04-20

    CPC classification number: G06F16/215 G06F16/2237 G06F16/2282 G06F16/278

    Abstract: A deduplicated storage system is provided according to certain embodiments that uses one or more mechanisms to update the deduplication database and remove records corresponding to data blocks that have been or will be erased from the secondary copies, without using or tracking reference counting values. Some embodiments described herein use a secondary table to identify the corresponding records from the primary table that can be removed and/or moved to another table for storing “zero-reference” data blocks. In other embodiments, the system will then traverse the “zero-reference” table and remove those primary data blocks from secondary storage devices.

    Block-level single instancing
    5.
    发明授权

    公开(公告)号:US11709739B2

    公开(公告)日:2023-07-25

    申请号:US17884482

    申请日:2022-08-09

    Abstract: Described in detail herein are systems and methods for single instancing blocks of data in a data storage system. For example, the data storage system may include multiple computing devices (e.g., client computing devices) that store primary data. The data storage system may also include a secondary storage computing device, a single instance database, and one or more storage devices that store copies of the primary data (e.g., secondary copies, tertiary copies, etc.). The secondary storage computing device receives blocks of data from the computing devices and accesses the single instance database to determine whether the blocks of data are unique (meaning that no instances of the blocks of data are stored on the storage devices). If a block of data is unique, the single instance database stores it on a storage device. If not, the secondary storage computing device can avoid storing the block of data on the storage devices.

    Block-level single instancing
    6.
    发明授权

    公开(公告)号:US11455212B2

    公开(公告)日:2022-09-27

    申请号:US17169257

    申请日:2021-02-05

    Abstract: Described in detail herein are systems and methods for single instancing blocks of data in a data storage system. For example, the data storage system may include multiple computing devices (e.g., client computing devices) that store primary data. The data storage system may also include a secondary storage computing device, a single instance database, and one or more storage devices that store copies of the primary data (e.g., secondary copies, tertiary copies, etc.). The secondary storage computing device receives blocks of data from the computing devices and accesses the single instance database to determine whether the blocks of data are unique (meaning that no instances of the blocks of data are stored on the storage devices). If a block of data is unique, the single instance database stores it on a storage device. If not, the secondary storage computing device can avoid storing the block of data on the storage devices.

    PARTIAL FILE RESTORE IN A DATA STORAGE SYSTEM

    公开(公告)号:US20190243718A1

    公开(公告)日:2019-08-08

    申请号:US16232965

    申请日:2018-12-26

    Abstract: The data storage system according to certain aspects can implement partial file restore, where only a portion of the secondary copy of a file is restored. Such portion may be designated by one or more application offsets for the file. The system may provide an in-chunk index that includes mapping information between the application offsets and the secondary copy offsets. Chunks may refer to logical data units in which secondary copies are stored, and the in-chunk index for a chunk may be stored in secondary storage with the chunk. Because the mapping information may not be provided at a fixed interval, the system can search through application offsets in the in-chunk index to locate the secondary copy offset corresponding to the portion application offset(s). In this manner, the system may restore the designated portion of the secondary copy in a fast and efficient manner by using the in-chunk index.

    Highly reusable deduplication database after disaster recovery

    公开(公告)号:US10339106B2

    公开(公告)日:2019-07-02

    申请号:US14682988

    申请日:2015-04-09

    Abstract: According to certain aspects, a method can include receiving, in response to an indication that a data storage database is being restored to a second time before a first time such that the data storage database comprises a plurality of first archive file identifiers associated at the second time, a first instruction from a data storage computer, where the first instruction instructs a media agent to stop scheduled secondary storage operations associated with a deduplication database, and where the deduplication database comprises a plurality of second archive file identifiers; determining at least one second archive file identifier in the plurality of second archive file identifiers that does not correlate with any first archive identifier in the plurality of first archive file identifiers; and, for each of the at least one second archive identifier, instructing the deduplication database to prune an entry associated with the respective second archive file identifier.

    Client-side repository in a networked deduplicated storage system

    公开(公告)号:US10191816B2

    公开(公告)日:2019-01-29

    申请号:US14673021

    申请日:2015-03-30

    Abstract: A storage system according to certain embodiments includes a client-side repository (CSR). The CSR may communicate with a client at a higher data transfer rate than the rate used for communication between the client and secondary storage. During copy operations, for instance, some or all of the data being backed up or otherwise copied to secondary storage is stored in the CSR. During restore operations, copies of the data stored in the CSR is accessed from the CSR instead of from secondary storage, improving performance. Remaining data blocks not stored in the CSR can be restored from secondary storage.

Patent Agency Ranking