Abstract:
A computer-implemented method for indexing content stored in a cache memory device is disclosed. The method starts with maintaining a fingerprint index having a plurality of fingerprint entries, each mapping a fingerprint to a storage location of a cache memory device, where the cache memory device caches some of data blocks stored in a persistent storage device of a storage system, and where the fingerprint index is a partial index indexing a portion of data stored in the cache memory device. In response to receiving a request to insert a new fingerprint, the method continues with evicting one of the fingerprint entries according to a predetermined eviction algorithm and inserting the new fingerprint into the evicted fingerprint entry.
Abstract:
Techniques for sanitizing a storage system are described herein. In one embodiment, for each file stored in the storage system, a list of fingerprints representing data chunks of the file is obtained. In such an embodiment, for each of the fingerprints, identifying a first container storing a data chunk corresponding to the fingerprint is identified, and determining a storage location of the first container in which the data chunk is stored is determined. In one embodiment, a bit in copy bit vector (CBV) is populated based on the identified container and the storage location. In one embodiment, after all of the bits corresponding to the data chunks of the first container have been populated in the CBV, data chunks represented by the CBV are copied from the first container to a second container, and records of the data chunks in the first container are erased.
Abstract:
A computer-implemented method for indexing content stored in a cache memory device is disclosed. The method starts with maintaining a file index having a plurality of extent entries, each extent entry corresponding to one of a plurality of file extents stored in a cache memory device that caches data stored in a persistent storage device of a storage system. The method continues with maintaining a fingerprint index having a plurality of fingerprint entries, each mapping a fingerprint to a data region of a file indexed in the file index, wherein each fingerprint indexed in the fingerprint index is retrieved from metadata stored in the persistent storage device of the storage system when one or more corresponding data chunks were accessed, and deduplicating and accessing the file extents stored in the cache memory device using the file index and the fingerprint index.
Abstract:
A computer-implemented method for caching content in a cache memory device is disclosed. The method starts with receiving a request for accessing a first data block associated with a first file, and a file manager provides access of the first data block in a persistent storage device of a storage system. The file manager then caches the first data block in a cache memory device including deduplicating the first data block, wherein at least some of data blocks stored in the cache memory device are deduplicated data blocks, and wherein at least one of the data blocks is referenced by different regions of an identical file or different files.
Abstract:
A computer-implemented method and system for improving efficiency in a delta compression process in a data storage system selects a data chunk to delta compress and selects a set of candidate data chunks using a first selection mechanism. Throughput or resource utilization is monitored. A change is made to a second selection mechanism that increases similarity of the set of candidates with the selected data chunk to improve compression in response to determining high resource availability or high throughput level. A change is made to a third selection mechanism that increases throughput of the delta compression process in response to determining low resources availability or low throughput.
Abstract:
A computer-implemented method and system for improving efficiency in a delta compression process in a data storage system selects a data chunk to delta compress and generates a sketch for the selected data chunk. The method and system search for a set of candidate data chunks with a matching sketch and rank the set of candidate data chunks by degree of sketch matching. The set of candidate data chunks are tie-braked using location status data for each candidate and the selected data chunk is delta compressed with a selected candidate data chunk. The delta compressed selected data chunk is then stored in a data storage system.
Abstract:
A computer-implemented method for indexing content stored in a cache memory device is disclosed. The method starts with in response to receiving a first request for caching a first file extent associated with a first file in a cache memory device, generating a first fingerprint based on content of the first file extent. Then the method continues with searching in a fingerprint index based on the first fingerprint to determine whether the first file extent has been stored in the cache memory. In response to determining that a fingerprint entry matching the first fingerprint is found, the method then continues with associating a first identifier identifying the first file extent and the first file with a storage location of the cache memory device obtained from the matching fingerprint entry, without storing the first file extent in the cache memory device.
Abstract:
Systems and methods for accessing data stored in multiple locations. A cache and a storage system are associated with an index. Entries in the index identify locations of data in both the cache and the storage system. When an index lookup occurs and an entry in the index identifies at least two locations for the data, the locations are ordered based on at least one factor and the data stored in the optimal location as determined from the at least one factor is returned.
Abstract:
Systems and methods for managing data structures in a flash memory. A library is provided that supports read requests and write requests. The library allows reads and writes to be implemented without requiring the client to understand how the data structure is implemented in the flash memory.
Abstract:
Systems and methods for managing content in a flash memory. Content or data in a flash memory is overwritten when the write operation only requires bits to be set. This improves performance of the flash and extends the life of the flash memory.