Fast algorithm to find file system difference for deduplication

    公开(公告)号:US11775484B2

    公开(公告)日:2023-10-03

    申请号:US16552965

    申请日:2019-08-27

    Applicant: VMware, Inc.

    CPC classification number: G06F16/1752 G06F16/152 G06F16/9027

    Abstract: The disclosure provides techniques for deduplicating files. The techniques include, upon creating or modifying a file, placing a logical timestamp of the current logical time, within a queue associated with the directory of the file. The techniques further include placing the logical timestamp within a queue of each parent directory of the directory of the file. To determine a set of files for deduplication, the techniques disclosed herein identify files that have been modified within a logical time range. The set of files modified within a logical time is identified by traversing directories of a storage system, the directories being organized within a tree structure. If a directory's queue does not contain a timestamp that is within the logical time range, then all child directories can be skipped over for further processing, such that no files within the child directories end up being within the set of files for deduplication.

    Creating, by host computers, respective object of virtual disk based on virtual disk blueprint

    公开(公告)号:US11210035B2

    公开(公告)日:2021-12-28

    申请号:US16988242

    申请日:2020-08-07

    Applicant: VMware, Inc.

    Abstract: Techniques are described for storing a virtual disk in an object store comprising a plurality of physical storage devices housed in a plurality of host computers. A profile is received for creation of the virtual disk wherein the profile specifies storage properties desired for an intended use of the virtual disk. A virtual disk blueprint is generated based on the profile such that that the virtual disk blueprint describes a storage organization for the virtual disk that addresses redundancy or performance requirements corresponding to the profile. A set of the physical storage devices that can store components of the virtual disk in a manner that satisfies the storage organization is then determined.

    Decoupling Compute and Storage Resources in Cloud-Based HCI (Hyper-Converged Infrastructure)

    公开(公告)号:US20200183720A1

    公开(公告)日:2020-06-11

    申请号:US16211047

    申请日:2018-12-05

    Applicant: VMware, Inc.

    Abstract: Techniques for decoupling compute and storage resources in a hyper-converged infrastructure (HCI) are provided. In one set of embodiments, a control plane of the HCI deployment can provision a host from a host platform of an infrastructure on which the HCI deployment is implemented and can provision one or more storage volumes from a storage platform of the infrastructure, where the storage platform runs on physical server resources in the infrastructure that are separate from the host platform. The control plane can then cause the one or more storage volumes to be network-attached to the host in a manner that enables a hypervisor of the host to make the one or more storage volumes available, as part of a virtual storage pool, to one or more virtual machines in the HCI deployment for data storage.

    Log-structured storage device format

    公开(公告)号:US10402374B2

    公开(公告)日:2019-09-03

    申请号:US14469418

    申请日:2014-08-26

    Applicant: VMware, Inc.

    Abstract: Embodiments of the disclosure provide techniques managing a log-structured solid state drive (SSD) format in a distributed storage system. SSDs in the distributed storage system maintains a journal of logical changes to storage objects to persist prepared and committed changes in the latency path. The journal includes metadata entries that describe changes and reference data pages. Dense data structures (such as a logical block addressing table) index the metadata entries. To reduce the amount of overhead in I/O operations, the distributed storage system maintains the dense data structures in memory rather than on disk.

    Providing end-to-end checksum within a distributed virtual storage area network module

    公开(公告)号:US10102057B2

    公开(公告)日:2018-10-16

    申请号:US14716756

    申请日:2015-05-19

    Applicant: VMware, Inc.

    Abstract: Exemplary methods, apparatuses, and systems include a first layer of a virtual storage area network (VSAN) module receiving a write request from a data compute node. The write request includes data to be written and the VSAN module is distributed across a plurality of computers to provide an aggregate object store using storage attached to each of the plurality of computers. The first layer of the VSAN module calculates a checksum for the data to be written and passes the data to be written and the checksum to a second layer of the VSAN module. The second layer of the VSAN module calculates a first verification checksum for the data to be written. The data and the checksum are written to persistent storage in response to determining the first verification checksum matches the checksum passed by the first layer of the VSAN module.

    DISTRIBUTED TRANSACTION LOG
    7.
    发明申请

    公开(公告)号:US20180067826A1

    公开(公告)日:2018-03-08

    申请号:US15810650

    申请日:2017-11-13

    Applicant: VMware, Inc.

    Abstract: Embodiments of the disclosure provide techniques for updating a distributed transaction log on a previously offline resource object component using distributed transaction logs from active host computer nodes from separate RAID mirror configurations. Each component object maintains a journal (log) where distributed transactions are recorded. If a component object goes offline and subsequently returns (e.g., if the node hosting the component object reboots), the component object is marked as stale. To return the component object to an active state, a distributed resources module retrieves the journals from other resource component objects from other RAID configurations where the data is mirrored. The module filters corresponding data that is missing in the journal of the previously offline corresponding object and merges the filtered data to the journal.

    Orchestrating high availability failover for virtual machines stored on distributed object-based storage
    9.
    发明授权
    Orchestrating high availability failover for virtual machines stored on distributed object-based storage 有权
    为存储在基于分布式对象的存储上的虚拟机协调高可用性故障转移

    公开(公告)号:US09495259B2

    公开(公告)日:2016-11-15

    申请号:US14317669

    申请日:2014-06-27

    Applicant: VMware, Inc.

    Abstract: Techniques are disclosed for orchestrating high availability (HA) failover for virtual machines (VMs) running on host systems of a host cluster, where the host cluster aggregates locally-attached storage resources of the host systems to provide an object store, and where persistent data for one or more of the VMs is stored as per-VM storage objects across the locally-attached storage resources comprising the object store. In one embodiment, a host system in the host cluster executing a HA module determines a VM to be restarted on an active host system in the host cluster. The host system further determines if the VM's persistent data is stored in the object store. If so, the host system adds the VM to a list of VMs to be immediately restarted. Otherwise, the host system checks whether the VM is accessible to the host system by querying a storage layer of the host system configured to manage the object store.

    Abstract translation: 披露了用于为在主机集群的主机系统上运行的虚拟机(VM)的高可用性(HA)故障切换进行协调的技术,其中主机集群聚集主机系统的本地连接的存储资源以提供对象存储,以及持久数据 对于一个或多个VM,通过包括对象存储的本地连接的存储资源被存储为每个VM存储对象。 在一个实施例中,执行HA模块的主机集群中的主机系统确定要在主机集群中的主动主机系统上重新启动的VM。 主机系统还确定VM的持久数据是否存储在对象存储中。 如果是这样,主机系统会将VM添加到要立即重新启动的VM列表中。 否则,主机系统通过查询配置为管理对象存储的主机系统的存储层来检查主机系统是否可访问VM。

    Probabilistic algorithm to check whether a file is unique for deduplication

    公开(公告)号:US11669495B2

    公开(公告)日:2023-06-06

    申请号:US16552908

    申请日:2019-08-27

    Applicant: VMware, Inc.

    CPC classification number: G06F16/1752 G06F16/152

    Abstract: Disclosed techniques include deduplication. Techniques include determining whether a file is unique, and depending on whether the file is unique, deduplicating only part of the file or the entire file. The techniques include processing the first chunk of a file to determine whether the hash of the chunk hash is already within a chunk hash table, and if not, then a percentage of chunks of the file is similarly processed. If any of the hashes of chunks are already in the chunk hash table, then at least some of file has been previously deduplicated, and file is not unique the storage system. If none of the processed chunks have a hash that is already in the chunk hash table, then the file is considered to be unique within chunk store and only a partial percentage of the file's chunks are deduplicated. Not all of a unique file's chunks are deduplicated.

Patent Agency Ranking