摘要:
Managing data blocks stored in a data processing system comprises logging input/output (I/O) accesses to an in-memory buffer with each log entry recording at least identifiers of the data blocks accessed and when the data blocks were accessed. When the size of the log entries reaches a predetermined threshold, the system may append log entries of the in-memory buffer to the end of a history log file. The history log file is analyzed to determine patterns of accesses, and each pattern is stored in a record in an access heuristics database. While processing a request for access to a data block, the data processing system queries the access heuristics database to obtain prior access patterns associated with the data block. A data management action may be taken based on the prior access patterns.
摘要:
Exemplary methods for minimizing contention among multiple threads include maintaining a plurality of linked lists of elements, each linked list corresponding to one of a plurality of threads accessing cache entries, each element of each linked list corresponding to one of the cache entries, wherein each linked list comprises a head element and a tail element, the head element corresponding to a most recently used (MRU) cache entry among all cache entries accessed by a corresponding thread, and the tail element corresponding to a least recently used cache entry among all cache entries accessed by the corresponding thread. In response to a cache eviction request, determining a LRU cache entry among the plurality of cache entries based on values accessed from one or more of the tail elements of the linked lists, and evicting the determined LRU cache entry by populating the determined LRU cache entry with the new data.
摘要:
Techniques for improving data compression of a storage system are described herein. According to one embodiment, a first sequence of data is partitioned into a plurality of data chunks in a first sequence order according to a predetermined chunking algorithm. The similarity of the data chunks is determined based on data patterns of the data chunks. The data chunks are reorganized into a second sequence order based on the similarity of the data chunks, the second sequence order being different from the first sequence order. The reorganized data chunks are compressed in the second sequence order into a second sequence of data, such that similar data chunks are stored and compressed together within the second sequence of data.
摘要:
A system for directing for storage comprises a processor and a memory. The processor is configured to determine a segment overlap for each of a plurality of nodes. The processor is further configured to determine a selected node of the plurality of nodes based at least in part on the segment overlap for each of the plurality of nodes and based at least in part on a selection criteria. The memory is coupled to the processor and configured to provide the processor with instructions.
摘要:
Techniques for balancing data compression and read performance of data chunks of a storage system are described herein. According to one embodiment, similar data chunks are identified based on sketches of a plurality of data chunks stored in the storage system. A first portion of the similar data chunks as a first group is associated with a first storage area. The first storage area is associated with one or more data chunks that are dissimilar to the first group but are likely accessed together. The first group of the similar data chunks and its associated dissimilar data chunks are compressed and stored in the first storage area.
摘要:
Exemplary methods for managing cache based on approximate least recently used (LRU) cache entries include maintaining a distributed data structure (DDS) of data elements, each corresponding to a cache entry of a plurality of cache entries, wherein each data element can be atomically accessed by multiple threads. In response to a cache eviction request from a first thread, determining an approximately LRU cache entry among the cache entries based on values atomically accessed from a first subset of the DDS of data elements, wherein the first subset of the DDS is atomically accessed using an atomic instruction without acquiring a lock to prevent another thread from accessing the first subset of the DDS to determine other approximately LRU cache entries among the cache entries, while allowing a second thread accessing a second subset of the DDS substantially concurrently. Evicting the determined approximately LRU cache entry.
摘要:
Stream locality delta compression is disclosed. A previous stream indicated locale of data segments is selected. A first data segment is then determined to be similar to a data segment in the stream indicated locale.
摘要:
A plurality of linked lists of elements is maintained corresponding to a plurality of threads accessing a plurality of cache entries, including a first linked list corresponding to a first thread and a second linked list corresponding to a second thread. Each element of each linked list corresponds to one of the plurality of cache entries. In response to the first thread accessing a cache entry corresponding to an element of the second linked list of elements, the element corresponding to the accessed cache entry is inserted to a head of the first linked list of elements. The element corresponding to the accessed cache entry is removed from the second linked list. One or more neighboring elements that were adjacent to the removed elements are re-linked on the second linked list.
摘要:
One embodiment provides a document management system comprising a storage system to store one or more encrypted documents, at least a first portion of a first encrypted document encrypted using a first encryption key, and an encryption key manager to manage a set of encryption keys for the documents on the storage system, the encryption key manager further to discard the first encryption key to provide secure removal of the portion of the encrypted document.
摘要:
A data processing system and methods for performing cache eviction are disclosed. An exemplary method includes maintaining a metadata set for each cache unit of a cache device, wherein the cache device comprises a plurality of cache units, each cache unit having a plurality of segments, calculating a score for each metadata set, and arranging the metadata sets in a list in ascending order from lowest score to highest score. The exemplary method further includes in response to determining that a cache eviction is to be performed, selecting a cache unit corresponding to the metadata set in the list having the lowest score, without recalculating a score for any of the metadata set, and evicting the selected cache unit. The metadata nay include, for example, segment count metadata, validity metadata, last access time (LAT) metadata, and hotness metadata.