Abstract:
Systems, methods, and computer-readable media for managing storing of data in a data storage system using a client tag. In some examples, a first portion of a data load as part of a transaction and a client identifier that uniquely identifies a client is received from the client at a data storage system. The transaction can be tagged with a client tag including the client identifier and the first portion of the data load can be stored in storage at the data storage system. A first log entry including the client tag is added to a data storage log in response to storing the first portion of the data load in the storage. The first log entry is then written from the data storage log to a persistent storage log in persistent memory which is used to track progress of storing the data load in the storage.
Abstract:
Embodiments include receiving an indication of a data storage module to be associated with a tenant of a distributed storage system, allocating a partition of a disk for data of the tenant, creating a first association between the data storage module and the disk partition, creating a second association between the data storage module and the tenant, and creating rules for the data storage module based on one or more policies configured for the tenant. Embodiments further include receiving an indication of a type of subscription model selected for the tenant, and selecting the disk partition to be allocated based, at least in part, on the subscription model selected for the tenant. More specific embodiments include generating a storage map indicating the first association between the data storage module and the disk partition and indicating the second association between the data storage module and the tenant.
Abstract:
A method can include receiving, at a workflow controller, a machine learning workflow, the machine learning workflow associated with a first task and a second task. The first task is training a machine learning model and the second task is deploying the model. The method can include segmenting, by the workflow controller, the machine learning workflow into a first sub-workflow associated with the first task and a second sub-workflow associated with the second task, assigning a first workflow agent to the first sub-workflow and assigning a second workflow agent to the second sub-workflow, selecting, by the first workflow agent and based on first resources needed to perform the first task, a first cluster for performing the first task and selecting, by the second workflow agent and based on second resources needed to perform the second task, a second cluster for performing the second task.
Abstract:
Embodiments include receiving an indication of a data storage module to be associated with a tenant of a distributed storage system, allocating a partition of a disk for data of the tenant, creating a first association between the data storage module and the disk partition, creating a second association between the data storage module and the tenant, and creating rules for the data storage module based on one or more policies configured for the tenant. Embodiments further include receiving an indication of a type of subscription model selected for the tenant, and selecting the disk partition to be allocated based, at least in part, on the subscription model selected for the tenant. More specific embodiments include generating a storage map indicating the first association between the data storage module and the disk partition and indicating the second association between the data storage module and the tenant.
Abstract:
Embodiments include obtaining at least one system metric of a distributed storage system, generating one or more recovery parameters based on the at least one system metric, identifying at least one policy associated with data stored in a storage node of a plurality of storage nodes in the distributed storage system, and generating a recovery plan for the data based on the one or more recovery parameters and the at least one policy. In more specific embodiments, the recovery plan includes a recovery order for recovering the data. Further embodiments include initiating a recovery process to copy replicas of the data from a second storage node to a new storage node, wherein the replicas of the data are copied according to the recovery order indicated in the recovery plan.
Abstract:
A method for assisting evaluation of anomalies in a distributed storage system is disclosed. The method includes a step of monitoring at least one system metric of the distributed storage system. The method further includes steps of maintaining a listing of patterns of the monitored system metric comprising patterns which previously did not result in a failure within one or more nodes of the distributed storage system, and, based on the monitoring, identifying a pattern (i.e., a time series motif) of the monitored system metric as a potential anomaly in the distributed storage system. The method also includes steps of automatically (i.e. without user input) performing a similarity search to determine whether the identified pattern satisfies one or more predefined similarity criteria with at least one pattern of the listing, and, upon positive determination, excepting the identified pattern from being identified as the potential anomaly.
Abstract:
Systems, methods, and computer-readable media for managing storing of data in a data storage system using a client tag. In some examples, a first portion of a data load as part of a transaction and a client identifier that uniquely identifies a client is received from the client at a data storage system. The transaction can be tagged with a client tag including the client identifier and the first portion of the data load can be stored in storage at the data storage system. A first log entry including the client tag is added to a data storage log in response to storing the first portion of the data load in the storage. The first log entry is then written from the data storage log to a persistent storage log in persistent memory which is used to track progress of storing the data load in the storage.
Abstract:
Systems, methods, and computer-readable storage media are provided for storing machine learned models in a tiered storage. The model serving network evaluates where the models should be stored based on the model corresponding service level agreement. The model is generally stored at the lowest tiered storage device that is still capable of satisfying the model's service level agreement. In this way, the model serving network aims to store data that achieves the cheapest cost.
Abstract:
Aspects of the subject technology relate to ways to determine the optimal storage of data structures in a hierarchy of memory types. In some aspects, a process of the technology can include steps for identifying a retrieval cost associated with retrieving a field in an object from data storage, comparing the retrieval cost for the field to a cost threshold for storing data in persistent memory, and selectively storing the field in either a persistent memory device or a non-persistent memory device based on a comparison of the retrieval cost for the field to the cost threshold. Systems and machine-readable media are also provided.
Abstract:
Aspects of the disclosed technology relate to ways to determine the optimal storage of data structures across different memory device is associated with physically disparate network nodes. In some aspects, a process of the technology can include steps for receiving a first retrieval request for a first object, searching a local PMEM device for the first object based on the first retrieval request, in response to a failure to find the first object on the local PMEM device, transmitting a second retrieval request to a remote node, wherein the second retrieval request is configured to cause the remote node to retrieve the first object from a remote PMEM device. Systems and machine-readable media are also provided.