Abstract:
A runtime method is disclosed that dynamically sets up core containers and thread-to-core affinity for processes running on manycore coprocessors. The method is completely transparent to user applications and incurs low runtime overhead. The method is implemented within a user-space middleware that also performs scheduling and resource management for both offload and native applications using the manycore coprocessors.
Abstract:
Systems and methods for sorting data, including chunking unsorted data such that each chunk is of a size that fits within a last level cache of the system. One or more threads are instantiated in each physical core of the system, chunks assigned physical cores are distributed evenly across the threads on the physical cores. Subchunks in the physical cores are sorted using vector intrinsics, the subchunks being data assigned to the threads in the physical cores, and the subchunks are merged to generate sorted large chunks. A binary tree, which includes leaf nodes that correspond to the sorted large chunks, is built, leaf nodes are assigned to threads, and tree nodes are assigned to a circular buffer, wherein the circular buffer is lock and synchronization free. The large chunks are sorted to generate sorted data as output.
Abstract:
A method is provided for detecting abnormal changes in real-time in dynamic graphs. The method includes extracting, by a graph sampler, an active sampled graph from an underlying base graph. The method further includes merging, by a graph merger, the active sampled graph with graph updates within a predetermined recent time period to generate a merged graph. The method also includes computing, by a graph diameter computer, a diameter of the merged graph. The method additionally includes determining, by a graph diameter change determination device, whether a graph diameter change exists. The method further includes generating, by an alarm generator, a user-perceptible alarm responsive to the graph diameter change.
Abstract:
A graph storage and processing system is provided. The system includes a scalable, distributed, fault-tolerant, in-memory graph storage device for storing base graph data representative of graphs. The system further includes a real-time, in memory graph storage device for storing update graph data representative of graph updates for the graphs with respect to a time threshold. The system also includes an in-memory graph sampler for sampling the base graph data to generate sampled portions of the graphs and for storing the sampled portions of the graph. The system additionally includes a query manager for providing a query interface between applications and the system and for forming graph data representative of a complete graph from at least the base graph data and the update graph data, if any. The system also includes a graph computer for processing the sampled portions using batch-type computations to generate approximate results for graph-based queries.
Abstract:
A method is provided for controlling a compute cluster having a plurality of nodes. Each of the plurality of nodes has a respective computing device with a main server and one or more coprocessor-based hardware accelerators. The method includes receiving a plurality of jobs for scheduling. The method further includes scheduling the plurality of jobs across the plurality of nodes responsive to a knapsack-based sharing-aware schedule generated by a knapsack-based sharing-aware scheduler. The knapsack-based sharing-aware schedule is generated to co-locate together on a same computing device certain ones of the plurality of jobs that are mutually compatible based on a set of requirements whose fulfillment is determined using a knapsack-based sharing-aware technique that uses memory as a knapsack capacity and minimizes makespan while adhering to coprocessor memory and thread resource constraints.
Abstract:
A runtime method is disclosed that dynamically sets up core containers and thread-to-core affinity for processes running on manycore coprocessors. The method is completely transparent to user applications and incurs low runtime overhead. The method is implemented within a user-space middleware that also performs scheduling and resource management for both offload and native applications using the manycore coprocessors.
Abstract:
A method is disclosed to manage a multi-processor system with one or more manycore devices, by managing real-time bag-of-tasks applications for a cluster, wherein each task runs on a single server node, and uses the offload programming model, and wherein each task has a deadline and three specific resource requirements: total processing time, a certain number of manycore devices and peak memory on each device; when a new task arrives, querying each node scheduler to determine which node can best accept the task and each node scheduler responds with an estimated completion time and a confidence level, wherein the node schedulers use an urgency-based heuristic to schedule each task and its offloads; responding to an accept/reject query phase, wherein the cluster scheduler send the task requirements to each node and queries if the node can accept the task with an estimated completion time and confidence level; and scheduling tasks and offloads using a aging and urgency-based heuristic, wherein the aging guarantees fairness, and the urgency prioritizes tasks and offloads so that maximal deadlines are met.