Abstract:
Techniques are provided for storing files in a parallel computing system based on a user-specification. A plurality of files generated by a distributed application in a parallel computing system are stored by obtaining a specification from the distributed application indicating how the plurality of files should be stored; and storing one or more of the plurality of files in one or more storage nodes of a multi-tier storage system based on the specification. The plurality of files comprise a plurality of complete files and/or a plurality of sub-files. The specification can optionally be processed by a daemon executing on one or more nodes in a multi-tier storage system. The specification indicates how the plurality of files should be stored, for example, identifying one or more storage nodes where the plurality of files should be stored.
Abstract:
Improved techniques are provided for storing files in a parallel computing system using a list-based index to identify file replicas. A file and at least one replica of the file are stored in one or more storage nodes of the parallel computing system. An index for the file comprises at least one list comprising a pointer to a storage location of the file and a storage location of the at least one replica of the file. The file comprises one or more of a complete file and one or more sub-files. The index may also comprise a checksum value for one or more of the file and the replica(s) of the file. The checksum value can be evaluated to validate the file and/or the file replica(s). A query can be processed using the list.
Abstract:
Cloud infrastructure of an information processing system comprises one or more processing devices implementing a plurality of virtual machines. The cloud infrastructure is configured to receive a processing job from a tenant, to obtain a first key specific to the tenant, to determine a second key utilizing information supplied by the tenant, and to encrypt one or more results of the processing job utilizing a combination of the first key and the second key. At least a portion of the second key is determined by at least one application that is run on at least one virtual machine of the cloud infrastructure in conjunction with performance of the processing job. The encrypted results of the processing job may be stored in a virtual memory of the cloud infrastructure and transmitted to the tenant.
Abstract:
Embodiments of the present invention provide a method of managing access of multiple client computers to a storage system that supports a limited number of logins. The method comprises, in response to a request to enable a subset of the clients to access resources of the storage system to perform a task, automatically configuring the storage system to provide the subset of the clients access to the resources, and, when the task is completed, automatically re-configuring the storage system so that the subset of the clients is no longer provided with access to the resources of the storage system.
Abstract:
Based on a count of the number of dirty pages in a cache memory, the dirty pages are written from the cache memory to a storage array at a rate having a component proportional to the rate of change in the number of dirty pages in the cache memory. For example, a desired flush rate is computed by adding a first term to a second term. The first term is proportional to the rate of change in the number of dirty pages in the cache memory, and the second term is proportional to the number of dirty pages in the cache memory. The rate component has a smoothing effect on incoming I/O bursts and permits cache flushing to occur at a higher rate closer to the maximum storage array throughput without a significant detrimental impact on client application performance.
Abstract:
Servers in a storage system store a nested multilayer directory structure, and a global index that is an abstract of the directory structure. The global index identifies respective portions of the directory structure that are stored in respective ones of the servers, and the global index identifies paths through the directory structure linking the respective portions. Upon performing a top-down search of the directory structure in response to a client request and finding that a portion of it is offline, the global index is searched to discover portions of the directory structure that are located below the offline portion. The global index may also identify the respective server storing each of the respective portions of the directory structure, and may indicate whether or not each of the respective portions of the directory structure is known to be offline.
Abstract:
Embodiments of the present invention are directed to techniques for selecting a data path over which to exchange information between a client device and a storage system by making a selection between a file system server (NAS) data path type (a first data path type) and a direct (SAN) data path type (a second data path type) based on one or more adjustable path selection factors and/or information regarding components of the computer system. For example, a data path may be selected based on a type of an input/output operation to be executed (i.e., whether the operation is a read operation or write operation) and/or any other suitable path selection factor.
Abstract:
A write interface in a file server provides permission management for concurrent access to data blocks of a file, ensures correct use and update of indirect blocks in a tree of the file, preallocates file blocks when the file is extended, solves access conflicts for concurrent reads and writes to the same block, and permits the use of pipelined processors. For example, a write operation includes obtaining a per file allocation mutex (mutually exclusive lock), preallocating a metadata block, releasing the allocation mutex, issuing an asynchronous write request for writing to the file, waiting for the asynchronous write request to complete, obtaining the allocation mutex, committing the preallocated metadata block, and releasing the allocation mutex. Since no locks are held during the writing of data to the on-disk storage and this data write takes the majority of the time, the method enhances concurrency while maintaining data integrity.
Abstract:
Network servers in a cluster share the same network protocol address for incoming client requests, and in a data link layer protocol a reply of a client to a request from a server is returned to this same server. For example: (1) ports of the servers are clustered into one single network channel used for incoming and outgoing requests to and from the servers; or (2) ports of the servers are clustered into one single network channel used for incoming requests to the servers and a separate port of each of the servers is used for outgoing requests from each of the servers; or (3) logical ports of the servers are clustered into one network channel used for requests to the servers and a separate logical port of each of the servers is used for outgoing requests from each of the servers.
Abstract:
A shallow file is adapted for intensive read-only access to data of a primary file. The primary file resides in another file system or file server. The shallow file includes the data block mapping metadata of the primary file and a link to the primary file. To open the shallow file, the file system manager of the shallow file obtains a read lock on the primary file from the file system manager of the primary file. Then the file system manager of the shallow file may use the data block mapping in the shallow file to access the file data from the primary file in storage without participation of the file system manager of the primary file. This permits offloading of data protection services for secure and efficient storage of a backup copy of the file data.