Abstract:
The present disclosure relates to assignment or generation of reducer virtual machines after the “map” phase is substantially complete in MapReduce. Instead of a priori placement, distribution of keys after the “map” phase over the mapper virtual machines can be used to efficiently reducer tasks in virtualized cloud infrastructure like OpenStack. By solving a constraint optimization problem, reducer VMs can be optimally assigned to process keys subject to certain constraints. In particular, the present disclosure describes a special variable matrix. Furthermore, the present disclosure describes several possible cost matrices for representing the costs determined based on the key distribution over the mapper VMs (and other suitable factors).
Abstract:
In one embodiment, a device receives information regarding a data set to be processed by a map-reduce process. The device generates a set of virtual clusters for the map-reduce process based on network bandwidths between nodes of the virtual clusters, each node of the virtual cluster corresponding to a resource device, and associates the data set with a map-reduce process task. The device then schedules the execution of the task by a node of the virtual clusters based on the network bandwidth between the node and a source node on which the data set resides.
Abstract:
Systems, methods, and computer-readable media for managing storing of data in a data storage system using a client tag. In some examples, a first portion of a data load as part of a transaction and a client identifier that uniquely identifies a client is received from the client at a data storage system. The transaction can be tagged with a client tag including the client identifier and the first portion of the data load can be stored in storage at the data storage system. A first log entry including the client tag is added to a data storage log in response to storing the first portion of the data load in the storage. The first log entry is then written from the data storage log to a persistent storage log in persistent memory which is used to track progress of storing the data load in the storage.
Abstract:
Systems, methods, and computer-readable media for managing storing of data in a data storage system using a client tag. In some examples, a first portion of a data load as part of a transaction and a client identifier that uniquely identifies a client is received from the client at a data storage system. The transaction can be tagged with a client tag including the client identifier and the first portion of the data load can be stored in storage at the data storage system. A first log entry including the client tag is added to a data storage log in response to storing the first portion of the data load in the storage. The first log entry is then written from the data storage log to a persistent storage log in persistent memory which is used to track progress of storing the data load in the storage.
Abstract:
Embodiments include receiving an indication of a data storage module to be associated with a tenant of a distributed storage system, allocating a partition of a disk for data of the tenant, creating a first association between the data storage module and the disk partition, creating a second association between the data storage module and the tenant, and creating rules for the data storage module based on one or more policies configured for the tenant. Embodiments further include receiving an indication of a type of subscription model selected for the tenant, and selecting the disk partition to be allocated based, at least in part, on the subscription model selected for the tenant. More specific embodiments include generating a storage map indicating the first association between the data storage module and the disk partition and indicating the second association between the data storage module and the tenant.
Abstract:
In one embodiment, a method for FPGA accelerated serverless computing comprises receiving, from a user, a definition of a serverless computing task comprising one or more functions to be executed. A task scheduler performs an initial placement of the serverless computing task to a first host determined to be a first optimal host for executing the serverless computing task. The task scheduler determines a supplemental placement of a first function to a second host determined to be a second optimal host for accelerating execution of the first function, wherein the first function is not able to accelerated by one or more FPGAs in the first host. The serverless computing task is executed on the first host and the second host according to the initial placement and the supplemental placement.
Abstract:
A method can include receiving, at a workflow controller, a machine learning workflow, the machine learning workflow associated with a first task and a second task. The first task is training a machine learning model and the second task is deploying the model. The method can include segmenting, by the workflow controller, the machine learning workflow into a first sub-workflow associated with the first task and a second sub-workflow associated with the second task, assigning a first workflow agent to the first sub-workflow and assigning a second workflow agent to the second sub-workflow, selecting, by the first workflow agent and based on first resources needed to perform the first task, a first cluster for performing the first task and selecting, by the second workflow agent and based on second resources needed to perform the second task, a second cluster for performing the second task.
Abstract:
One aspect of the disclosure relates to, among other things, a method for optimizing and provisioning a software-as-a-service (SaaS). The method includes determining a graph comprising interconnected stages for the SaaS, wherein each stage has a replication factor and one or more metrics that are associated with one or more service level objectives of the SaaS, determining a first replication factor associated with a first one of the stages which meets a first service level objective of the SaaS, adjusting the first replication factor associated with the first one of the stage based on the determined first replication factor, and provisioning the SaaS onto networked computing resources based on the graph and replication factors associated with each stage.
Abstract:
In one embodiment, a method implements virtualized network functions in a serverless computing system having networked hardware resources. An interface of the serverless computing system receives a specification for a network service including a virtualized network function (VNF) forwarding graph (FG). A mapper of the serverless computing system determines an implementation graph comprising edges and vertices based on the specification. A provisioner of the serverless computing system provisions a queue in the serverless computing system for each edge. The provisioner further provisions a function in the serverless computing system for each vertex, wherein, for at least one or more functions, each one of said at least one or more functions reads incoming messages from at least one queue. The serverless computing system processes data packets by the queues and functions in accordance with the VNF FG. The queues and functions processes data packets in accordance with the VNF FG.
Abstract:
Embodiments include receiving an indication of a data storage module to be associated with a tenant of a distributed storage system, allocating a partition of a disk for data of the tenant, creating a first association between the data storage module and the disk partition, creating a second association between the data storage module and the tenant, and creating rules for the data storage module based on one or more policies configured for the tenant. Embodiments further include receiving an indication of a type of subscription model selected for the tenant, and selecting the disk partition to be allocated based, at least in part, on the subscription model selected for the tenant. More specific embodiments include generating a storage map indicating the first association between the data storage module and the disk partition and indicating the second association between the data storage module and the tenant.