Abstract:
The techniques introduced here provide for efficient management of storage resources in a modern, dynamic data center through the use of virtual storage appliances. Virtual storage appliances perform storage operations and execute in or as a virtual machine on a hypervisor. A storage management system monitors a storage system to determine whether the storage system is satisfying a service level objective for an application. The storage management system then manages (e.g., instantiates, shuts down, or reconfigures) a virtual storage appliance on a physical server. The virtual storage appliance uses resources of the physical server to meet the storage related needs of the application that the storage system cannot provide. This automatic and dynamic management of virtual storage appliances by the storage management system allows storage systems to quickly react to changing storage needs of applications without requiring expensive excess storage capacity.
Abstract:
A method and apparatus for proxying search requests for a storage system and maintaining a central index for performing the search requests is described herein. An index manager on the storage system may initially produce the central index by examining each file in a file system and update the central index thereafter by examining only those files that have changed since the central index was initially produced or last updated. The index manager may receive a changed file list from a differencing layer configured for comparing snapshots of the file system at different time points to produce changed file lists. A search proxy module may receive search requests in a search protocol and proxy the search requests to a search engine by converting the search requests to another search protocol compatible with the search engine. The search engine may then use the central index for performing the search request.
Abstract:
Methods and system for securely capturing workloads at a live network for replaying at a test network. The disclosed system captures file system states and workloads of a live server at the live network. In one embodiment the captured data is anonymized to protect confidentiality of the data. A file system of a test server at the test network is mirrored from a captured state of the live server. An anonymized version of the captured workloads is replayed as a request to the test server. A lost or incomplete command is recreated from the states of the live server. An order of the commands during replay can be based on an order in the captured workload, or based on a causal relationship. Performance characteristics of the live network are determined based on the response to the replayed command.
Abstract:
A distributed object store in a network storage system uses location-independent global object identifiers (IDs) for stored data objects. The global object ID enables a data object to be seamlessly moved from one location to another without affecting clients of the storage system, i.e., “transparent migration”. The global object ID can be part of a multilevel object handle, which also can include a location ID indicating the specific location at which the data object is stored, and a policy ID identifying a set of data management policies associated with the data object. The policy ID may be associated with the data object by a client of the storage system, for example when the client creates the object, thus allowing “inline” policy management. An object location subsystem (OLS) can be used to locate an object when a client request does not contain a valid location ID for the object.
Abstract:
A distributed object store in a network storage system uses location-independent global object identifiers (IDs) for stored data objects. The global object ID enables a data object to be seamlessly moved from one location to another without affecting clients of the storage system, i.e., “transparent migration”. The global object ID can be part of a multilevel object handle, which also can include a location ID indicating the specific location at which the data object is stored, and a policy ID identifying a set of data management policies associated with the data object. The policy ID may be associated with the data object by a client of the storage system, for example when the client creates the object, thus allowing “inline” policy management. An object location subsystem (OLS) can be used to locate an object when a client request does not contain a valid location ID for the object.
Abstract:
A system and method reclaims unused storage space from a data container, such as a logical unit number (LUN) of a storage system. In particular, a novel technique is provided that allows a storage system to reclaim storage space not used by a client file system for which the storage system maintains storage, without requiring assistance from the client file system to determine storage usage. In other words, storage system may independently reclaim storage space not used by the client file system, without that file system's intervention.
Abstract:
Method and system for processing a plurality of requests for accessing “small files” stored at a storage device is provided. A user may define file size and each small file may include one or more blocks of data. The requests are sorted based on an address of a first data block for each small file. The sorted requests are then used to access the stored files, instead of accessing the requests based on when a request was received.
Abstract:
A system and method for nearly in-band search indexing. A network switch (or other intermediate network device) is configured to provide port mirroring so that data access requests directed to a storage system are forwarded to both the storage system and to a search appliance. The search appliance collects index information from the received data access requests to update a search index. As the search appliance is nearly in-band, i.e., not directly in-line of the data access request path, no increase of latency occurs for processing data access requests by the storage system.
Abstract:
Techniques for a data storage cluster and a method for deduplicating data in the data storage cluster in a scalable manner, by (among other things) using an epoch-based global chunk data structure, are disclosed herein. A global chunk data structure for an epoch is distributed and maintained at a plurality of metadata nodes within the data storage cluster. Fingerprints and identifiers of data chunks are written to the cluster after a particular epoch are written to delta chunk data structures stored in different metadata nodes of the cluster. When the data storage cluster advances to the next epoch, the global chunk data structure is updated using the delta chunk data structures. At any given time, data deduplication in the data storage cluster can be conducted based on the global chunk data structure for the current epoch.
Abstract:
A method and apparatus for proxying search requests for a storage system and maintaining a central index for performing the search requests is described herein. An index manager on the storage system may initially produce the central index by examining each file in a file system and update the central index thereafter by examining only those files that have changed since the central index was initially produced or last updated. The index manager may receive a changed file list from a differencing layer configured for comparing snapshots of the file system at different time points to produce changed file lists. A search proxy module may receive search requests in a search protocol and proxy the search requests to a search engine by converting the search requests to another search protocol compatible with the search engine. The search engine may then use the central index for performing the search request.