Abstract:
The embodiments described herein relate to managing compressed data to optimize file compression for efficient random access to the data. A first partition of a first data block of a compression group is compressed. The first compressed partition is stored in a first compression entity. An in-memory table is maintained, which includes updating the in-memory table with data associated with an address of the stored compressed first partition. At such time as it is determined that the first compression entity is full, the in-memory table is compressed and written to the first compression entity. Accordingly, the in-memory table, which stores partition compression data, is store with the compression entity.
Abstract:
Embodiments relate to policy-based, multi-scheme data reduction for a computer memory. An aspect includes receiving a plurality of policy rules by a policy engine of the computer memory, wherein a first policy rule specifies applying a first data reduction scheme to data in the computer memory based on the data matching first characteristics, wherein a second policy rule specifies applying a second data reduction scheme to data in the computer memory based on the data matching second characteristics, wherein the first data reduction scheme is different from the second data reduction scheme. Another aspect includes determining, by the policy engine, that first data in the computer memory matches the first characteristics, and that second data in the computer memory matches the second characteristics. Yet another aspect includes applying the first data reduction scheme to the first data, and applying the second data reduction scheme to the second data.
Abstract:
Embodiments relate to policy-based, multi-scheme data reduction for a computer memory. An aspect includes receiving a plurality of policy rules by a policy engine of the computer memory, wherein a first policy rule specifies applying a first data reduction scheme to data in the computer memory based on the data matching first characteristics, wherein a second policy rule specifies applying a second data reduction scheme to data in the computer memory based on the data matching second characteristics, wherein the first data reduction scheme is different from the second data reduction scheme. Another aspect includes determining, by the policy engine, that first data in the computer memory matches the first characteristics, and that second data in the computer memory matches the second characteristics. Yet another aspect includes applying the first data reduction scheme to the first data, and applying the second data reduction scheme to the second data.
Abstract:
Embodiments are provided for enhancing storage efficiency in a de-duplication enabled storage system. Metadata of a shared-nothing clustered file system is scanned, and a first state of the storage system is determined. One or more cores are located from the metadata. Each core includes a grouping of objects having a minimum coreness. An arrangement of the located cores is optimized to improve global de-duplication efficiency by evaluating the objects of each core, identifying respective nodes in the storage system to maintain each core for de-duplication efficiency based on the evaluation, and re-arranging one or more of the evaluated objects in the storage system.
Abstract:
The embodiments described herein relate to managing compressed data to optimize file compression for efficient random access to the data. A first partition of a first data block of a compression group is compressed. The first compressed partition is stored in a first compression entity. An in-memory table is maintained, which includes updating the in-memory table with data associated with an address of the stored compressed first partition. At such time as it is determined that the first compression entity is full, the in-memory table is compressed and written to the first compression entity. Accordingly, the in-memory table, which stores partition compression data, is store with the compression entity.
Abstract:
The embodiments described herein relate to managing compressed data to optimize file compression. A first compression is performed on a first set of data to create first compressed data. The first compressed data is stored in one or more blocks of a first compression group. A size of free space of a last block of the first compression group is discovered and calculated. A second compression is performed on a second set of data to create second compressed data. At least a portion of the second compressed data is supplied to the first compression group for padding into the last block in response to determining that the size of the free space is sufficient. An unpadded portion of the second compressed data is stored in one or more blocks of a second compression group.
Abstract:
The embodiments described herein relate to managing compressed data to optimize file compression. A compression is performed on a first set of data to create a first set of compressed data partitions in a compression group. A partition table is constructed, and partition entries for the first data block are added to the table in conjunction with the first set. A current size of the compression group is assessed. In response to a compression ratio being greater than a target compression ratio and internal fragmentation of the compression group being smaller than a threshold, the compression group is dynamically completed. The dynamic completion decides a size for the compression group. The partition table is added to the compression group by assigning space within the first compression group for the table. The compression group is written to persistent storage.
Abstract:
The embodiments described herein relate to managing compressed data to optimize file compression. A compression is performed on a first set of data to create a first set of compressed data partitions in a compression group. A partition table is constructed, and partition entries for the first data block are added to the table in conjunction with the first set. A current size of the compression group is assessed. In response to a compression ratio being greater than a target compression ratio and internal fragmentation of the compression group being smaller than a threshold, the compression group is dynamically completed. The dynamic completion decides a size for the compression group. The partition table is added to the compression group by assigning space within the first compression group for the table. The compression group is written to persistent storage.
Abstract:
A protocol is employed to estimate duplication of data in a storage system. This estimate is employed as a factor of enabling de-duplication, and if de-duplication is enabled, the data sets which will be subject to the de-duplication. The protocol includes a measurement procedure and an execution procedure. The measurement procedure characterizes data duplication in part of the data on the storage system, and the execution procedure use the characterization to adjust selection of which data sets are subject to de-duplication.
Abstract:
Embodiments of the invention relate to compressed storage systems, and reducing metadata representing compressed data. Compressed data is stored in units referred to as partitions, with each partition having a header that contains a virtual address of data stored in the partition. A linear function is providing to represent a mapping between a virtual address segment and a compressed data extent, with a slope of the function representing an associated compression ratio. A read operation is supported by consulting the mapping and using the mapping to locate the corresponding compressed extent. Similarly, a write operation is supported by writing a new segment, compressing content in the segment, and computing a new mapping of the compressed segment metadata in memory. The new mapping is represented in the linear function.