FAST ALGORITHM TO FIND FILE SYSTEM DIFFERENCE FOR DEDUPLICATION

    公开(公告)号:US20210064580A1

    公开(公告)日:2021-03-04

    申请号:US16552965

    申请日:2019-08-27

    Applicant: VMware, Inc.

    Abstract: The disclosure provides techniques for deduplicating files. The techniques include, upon creating or modifying a file, placing a logical timestamp of the current logical time, within a queue associated with the directory of the file. The techniques further include placing the logical timestamp within a queue of each parent directory of the directory of the file. To determine a set of files for deduplication, the techniques disclosed herein identify files that have been modified within a logical time range. The set of files modified within a logical time is identified by traversing directories of a storage system, the directories being organized within a tree structure. If a directory's queue does not contain a timestamp that is within the logical time range, then all child directories can be skipped over for further processing, such that no files within the child directories end up being within the set of files for deduplication.

    MIGRATING VIRTUAL MACHINES IN CLUSTER MEMORY SYSTEMS

    公开(公告)号:US20230023696A1

    公开(公告)日:2023-01-26

    申请号:US17495846

    申请日:2021-10-07

    Applicant: VMWARE, INC.

    Abstract: Disclosed are various embodiments for optimizing the migration of processes or virtual machines in cluster memory systems. To begin, a first computing device can identify a set of pages allocated to a process or virtual machine hosted by the first computing device. Then, the first computing device can identify a subset of the allocated pages that have been accessed with a least a predefined frequency. Next, the first computing device can copy the subset of the allocated pages to a second computing device. Subsequently, the first computing device can copy a page mapping table to the second computing device, the page mapping table specifying which pages in the set of pages allocated to the process or virtual machine are stored by a memory host. Finally, the first computing device can copy remaining ones of the allocated pages to the second computing device.

    SMALL IN-MEMORY CACHE TO SPEED UP CHUNK STORE OPERATION FOR DEDUPLICATION

    公开(公告)号:US20210064581A1

    公开(公告)日:2021-03-04

    申请号:US16552976

    申请日:2019-08-27

    Applicant: VMware, Inc.

    Abstract: The present disclosure provides techniques for deduplicating files. The techniques include creating a cache or subset of a large data structure. The large data structure organizes information by random hash values. The random hash values result in a random organization of information within the data structure, with the information spanning a large number of storage blocks within a storage system. The cache, however, is within memory and is small relative to the data structure. The cache is created so as to contain information that is likely to be needed during deduplication of a file. Having needed information within memory rather than in storage results in faster read and write operations to that information, improving the performance of a computing system.

    PROBABILISTIC ALGORITHM TO CHECK WHETHER A FILE IS UNIQUE FOR DEDUPLICATION

    公开(公告)号:US20210064579A1

    公开(公告)日:2021-03-04

    申请号:US16552908

    申请日:2019-08-27

    Applicant: VMware, Inc.

    Abstract: Disclosed techniques include deduplication. Techniques include determining whether a file is unique, and depending on whether the file is unique, deduplicating only part of the file or the entire file. The techniques include processing the first chunk of a file to determine whether the hash of the chunk hash is already within a chunk hash table, and if not, then a percentage of chunks of the file is similarly processed. If any of the hashes of chunks are already in the chunk hash table, then at least some of file has been previously deduplicated, and file is not unique the storage system. If none of the processed chunks have a hash that is already in the chunk hash table, then the file is considered to be unique within chunk store and only a partial percentage of the file's chunks are deduplicated. Not all of a unique file's chunks are deduplicated.

    EFFICIENT GARBAGE COLLECTION OF VARIABLE SIZE CHUNKING DEDUPLICATION

    公开(公告)号:US20210064522A1

    公开(公告)日:2021-03-04

    申请号:US16552954

    申请日:2019-08-27

    Applicant: VMware, Inc.

    Abstract: The present disclosure provides techniques for deallocating previously allocated storage blocks. The techniques include obtaining a list of chunk IDs to analyze, choosing a chunk ID, and determining the storage blocks spanned by the chunk corresponding to the chosen chunk ID. The technique further includes determining whether any file references any storage blocks spanned by the chunk. The determining may be performed by comparing an internal reference count to a total reference count, where the internal reference count is the number of reference to the storage block by a chunk ID data structure. If no files reference any of the storage blocks spanned by the chunk, then all the storage blocks of the chunk can be deallocated.

Patent Agency Ranking