Abstract:
A computer-implemented method according to one embodiment includes identifying a plurality of segment files within an object storage system, determining all data blocks associated with the plurality of segment files within the object storage system, and mapping all the data blocks associated with the plurality of segment files to a single new file within the object storage system.
Abstract:
A processor may identify a first directory in the UFO storage system. The first directory may include one or more subdirectories in one or more levels under the first directory. The one or more subdirectories may include a second directory that has includes one or more objects. The first directory may be associated with a first inode, and the second directory may be associated with a second inode. The processor may perform a stat call on the second directory to determine metadata attributes for the one or more objects that are stored in the second directory. The metadata attributes for the one or more objects may be stored in the second inode. The processor may add the metadata attributes for the one or more objects to the first inode.
Abstract:
An object-based data storage system includes a memory and a processor for executing machine executable instructions configured for implementing logical containers for data objects each having a global identifier. The containers are configured for storing metadata including a first parameterization value descriptive of a number of storage nodes and a second parameterization value descriptive of a classification of the data objects. The machine executable instructions are further configured for implementing a first object storage ring for addressing storage locations across the multiple storage nodes using a surjective function. Execution of the instructions causes the processor to: instantiate the first object storage ring, receive the global identifier and the metadata by the first object storage ring, and generate a storage address by the first object storage ring for the data object using the global identifier, the first parameterization value and the second parameterization value as input to the surjective function.
Abstract:
In one general embodiment, a computer program product for sharing a data management policy with a load balancer comprises a computer readable storage medium having program instructions embodied therewith, wherein the computer readable storage medium is not a transitory signal per se. Additionally, the program instructions are executable by a processor to cause the processor to perform a method comprising analyzing, by the processor, a plurality of data management factors within an object-based storage system, determining, by the processor, a data management policy for predetermined data within the object-based storage system, based on the analyzing, and sharing, by the processor, the data management policy for the predetermined data with a load balancer associated with the object-based storage system.
Abstract:
A computer-implemented method is provided for concurrent file and object protocol access. The method includes receiving a notification that indicates storage of an object by an object-based client, and creating a clone of the object. Also, the method includes providing a file-based client access to the clone of the object. Further, the method includes returning, in response to one or more read requests for the object received from one or more object-based clients while the file-based client modifies the clone of the object, the object to the one or more object-based clients. Moreover, the method includes, after the file-based client has finished modifying the clone of the object, replacing the object with an updated object based on the modified clone of the object.
Abstract:
A method for more efficiently storing genomic includes designating multiple different data storage techniques for storing genomic data generated by a genomic pipeline. The method further identifies a file, made up of multiple blocks, generated by the genomic pipeline. The method determines which data storage technique is most optimal to store each block of the file. In doing so, the method may consider the type of the file, the stage of the genomic pipeline that generated the file, the access frequency for blocks of the file, the most accessed blocks of the file, and the like. The method stores each block using the data storage technique determined to be most optimal after completion of a designated stage of the genomic pipeline, such that blocks of the file are stored using several different data storage techniques. A corresponding system and computer program product are also disclosed.
Abstract:
In one embodiment, a computer program product includes a computer-readable storage medium having program instructions embodied therewith. The embodied program instructions are executable by a processor to cause the processor to receive, by the processor, a first job request. The embodied program instructions are also executable by the processor to cause the processor to analyze, by the processor, the first job request to determine a user skill level of a user that submitted the first job request. Moreover, the embodied program instructions are executable by the processor to cause the processor to admit, by the processor, the first job request to a data analytics system and/or a data storage system in a specified order with respect to other received job requests based on at least the user skill level of the user that submitted the first job request. Other systems and methods are described in accordance with more embodiments.
Abstract:
Various embodiments providing a framework for Quality of Service (QoS) within and between globally distributed computing components by a processor. At least one resource required for a computing process is estimated by examining information associated with a resource template. A storlet is allocated as the at least one resource at a storage node, thereby offloading computing elements to at least one storage unit. The allocated storlet performs the computing process according to constraints delineated by the resource template.
Abstract:
A mechanism is provided for enabling separation of compute infrastructure built within a geographically located storage device. A determination is made as to whether a compute request originates from a geographical location that is the same as a geographical location of the geographically located storage device. Responsive to the compute request originating from a geographical location different from the geographical location of the geographically located storage device, a determination is made as to whether the compute request complies with governing requirements that govern the geographically located storage device. Responsive to the compute request complying with the requirements that govern the geographically located storage device, a determination is made as to whether the compute request is for data retrieval only. Responsive to the compute request being for data retrieval only, the requested data is gathered from data storage of the geographically located storage device and sent to a requesting client.
Abstract:
In one embodiment, a computer program product includes a computer readable storage medium having program instructions embodied therewith. The embodied program instructions are executable by a processor to cause the processor to receive, by the processor, a first job request, and analyze, by the processor, the first job request to determine: an estimated complexity of the first job request based on one or more attributes of the first job request and a user skill level of a user that submitted the first job request. Moreover, the embodied program instructions are executable by the processor to admit, by the processor, the first job request to a data analytics system and/or a data storage system in a specified order with respect to other received job requests based on at least: the estimated complexity of the first job request, and the user skill level of the user that submitted the first job request.