Abstract:
Processing and memory resources are optimized in a data storage system by reading a region of compressed data containing desired data is read from primary storage, writing the compressed data to a memory page, selectively decompressing the compressed data to retrieve the desired data, and the writing the decompressed data back to the same page. State information about the start of the compressed data and the end of the decompressed data on the page is maintained to enable decompression to be halted and resumed on demand.
Abstract:
A computer-implemented method is disclosed. The method starts with determining a first container of a storage system is invalid. The method continues with the storage system setting a data recovery state for the first container to be en-queue, which indicates that data of at least one of the data segments needs to be recovered from the first container, and executing a process to recover any container having an en-queue data recovery state, and for each of the containers, to recover any valid data segment from the corresponding container. The process includes scanning the data segments of the first container to find valid data segments, moving or replicating the valid data segments to a second container, and setting the data recovery state for the first container to be complete once all the valid data segments are moved or replicated to the second container.
Abstract:
Systems and methods for accessing data stored in multiple locations. A cache and a storage system are associated with an index. Entries in the index identify locations of data in both the cache and the storage system. When an index lookup occurs and an entry in the index identifies at least two locations for the data, the locations are ordered based on at least one factor and the data stored in the optimal location as determined from the at least one factor is returned.
Abstract:
Methods, systems, and apparatus for optimizing a cache memory device of a storage system are described. In one embodiment, a first base segment tree representing a first full backup including data and metadata describing the data is cached in a cache memory device. Subsequently, a plurality of incremental segment trees representing incremental backups to the first full backup are cached in the cache memory device. Each of incremental segment trees corresponding to the changes to the first full backup, without modifying the first base segment tree in response to the changes. At least two of the incremental segment trees are merged into an updated incremental segment tree to reduce a storage space of the cache memory device to store the incremental segment trees. The updated incremental segment tree comprises data and metadata represented by two or more incremental segment trees.
Abstract:
Physical storage is replaced online in a data integrity storage system comprising RAID groups of physical disks in separate enclosures (shelves). All disks of a RAID group are located on a corresponding shelf, and each shelf is mapped by an internal file system to a separate independent logical address space partitioned into a plurality of blocks forming a blockset containing data and metadata. Source shelf disk data is moved online to disks of a target shelf using invulnerable data movement that ensures the integrity of the data, and source shelf blockset metadata is migrated to a corresponding target shelf blockset. After verifying the correctness of the target data and metadata, the source shelf and blockset are removed.
Abstract:
Methods, systems, and apparatus for providing data storage services of a storage system are described. In one embodiment, a first file representing a first full backup including data and metadata describing the data is cached as a first segment tree having a plurality of layers of nodes in a tree hierarchy. A second file representing an incremental backup of the first full backup is cached as a second segment tree in the cache memory device. The second segment tree describes changes of the data and the metadata of the incremental backup in view of the data and the metadata of the first full backup without caching any of nodes of the first segment tree again. The first and second segment trees are collectively used to represent a second full backup based on the incremental backup and the first full backup.
Abstract:
Embodiments are directed to a method of minimizing latency and input/output (I/O) operations in a data storage system by defining a sparse metadata segment tree to identify changed data blocks, wherein a full version of the tree is stored in a memory and modified versions of the tree are stored in cache memory, and using the sparse metadata segment tree to perform at least one data storage application including file verification, file replication, file restores, and file system snapshots.
Abstract:
A single virtual storage device file system that abstracts multiple RAID groups of physical storage devices into one virtual device and one first blockset having a plurality of data blocks in a contiguous linear address space is converted into a multiple virtual device file system that abstracts the multiple RAID groups of physical storage devices as separate multiple virtual storage devices each having a separate second blockset and address space, by migrating data in allocated blocks at boundaries of the physical storage device groups to free blocks, partitioning the first blockset at the boundaries into the multiple second blocksets, updating the block metadata of each block, and rebuilding the file system using the block metadata to generate second blockset metadata.
Abstract:
Techniques to perform segment index lookups are disclosed. In various embodiments, for each of one or more segment index entries included in a first on disk segment index a corresponding set of values is stored in a bloom filter. The bloom filter is used to determine prior to performing an on disk segment lookup of the segment index with respect to a given segment whether each location in the bloom filter that is associated with the given segment has been set to said corresponding set of values. An on disk lookup is performed in parallel of a second on disk segment index that is not included in said subset of on disk segment indexes each of which has associated therewith a corresponding bloom filter.