Abstract:
Techniques generate memory-optimization logic for concurrent graph analysis. A computer analyzes domain-specific language logic that analyzes a graph having vertices and edges. The computer detects parallel execution regions that create thread locals. Each thread local is associated with a vertex or edge. For each parallel region, the computer calculates how much memory is needed to store one instance of each thread local. The computer generates instrumentation that determines how many threads are available and how many vertices and edges will create thread locals. The computer generates tuning logic that determines how much memory is originally needed for the parallel region based on how much memory is needed to store the one instance, how many threads are available, and graph size. The tuning logic detects a memory shortage based on the original amount of memory needed exceeding how much memory is available and accordingly adjusts the execution of the parallel region.
Abstract:
Techniques herein index data transferred during distributed graph processing. In an embodiment, a system of computers divides a directed graph into partitions. The system creates one partition per computer and distributes each partition to a computer. Each computer builds four edge lists that enumerate edges that connect the partition of the computer with a partition of a neighbor computer. Each of the four edge lists has edges of a direction, which may be inbound or outbound from the partition. Edge lists are sorted by identifier of the vertex that terminates or originates each edge. Each iteration of distributed graph analysis involves each computer processing its partition and exchanging edge data or vertex data with neighbor computers. Each computer uses an edge list to build a compactly described range of edges that connect to another partition. The computers exchange described ranges with their neighbors during each iteration.
Abstract:
Systems and methods for interactive front-end graph analysis are provided herein. According to one embodiment, a front-end application receives, from a compiler, first meta-information for a particular graph analysis procedure, where the first meta-information identifies a set of input parameters for passing graph information to the particular graph analysis procedure. The front-end application registers, using the first meta-information, the particular graph analysis procedure as an available command. The front-end application also receives second meta-information that identifies, for each respective graph object of a set of one or more graph objects, a respective set of graph characteristics. In response to receiving a request to apply the particular graph analysis procedure to the set of one or more graph objects, the front-end application enforces a set of one or more constraints based on the first meta-information and the second meta-information.
Abstract:
Techniques are provided for latency-hiding context management for concurrent distributed tasks. A plurality of task objects is processed, including a first task object corresponding to a first task that includes access to first data residing on a remote machine. A first access request is added to a request buffer. A first task reference identifying the first task object is added to a companion buffer. A request message including the request buffer is sent to the remote machine. A response message is received, including first response data responsive to the first access request. For each response of one or more responses of the response message, the response is read from the response message, a next task reference is read from the companion buffer, and a next task corresponding to the next task reference is continued based on the response. The first task is identified and continued.
Abstract:
Techniques are provided for latency-hiding context management for concurrent distributed tasks. A plurality of task objects is processed, including a first task object corresponding to a first task that includes access to first data residing on a remote machine. A first access request is added to a request buffer. A first task reference identifying the first task object is added to a companion buffer. A request message including the request buffer is sent to the remote machine. A response message is received, including first response data responsive to the first access request. For each response of one or more responses of the response message, the response is read from the response message, a next task reference is read from the companion buffer, and a next task corresponding to the next task reference is continued based on the response. The first task is identified and continued.
Abstract:
Techniques for identifying common neighbors of two nodes in a graph are provided. One technique involves performing a binary split search and/or a linear search. Another technique involves creating a segmenting index for a first neighbor list. A second neighbor list is scanned and, for each node indicated in the second neighbor list, the segmenting index is used to determine whether the node is also indicated in the first neighbor list. Techniques are also provided for counting the number of triangles. One technique involves pruning nodes from neighbor lists based on the node values of the nodes whose neighbor lists are being pruned. Another technique involves sorting the nodes in a node array (and, thus, their respective neighbor lists) based on the nodes' respective degrees prior to identifying common neighbors. In this way, when pruning the neighbor lists, the neighbor lists of the highly connected nodes are significantly reduced.
Abstract:
Techniques are provided for performing an invalidate operation in a non-coherent cache. In response to receiving an invalidate instruction, a cache unit only invalidates cache entries that are associated with invalidation data. In this way, a separate invalidate instruction is not required for each cache entry that is to be invalidated. Also, cache entries that are not to be invalidated remain unaffected by the invalidate operation. A cache entry may be associated with invalidation data if an address of the corresponding data item is in a particular set of addresses. The particular set of addresses may have been specified as a result of an invalidation instruction specified in code that is executing on a processor that is coupled to the cache.
Abstract:
The illustrative embodiments provide techniques that utilizes graph topology information to partition work according to ranges of vertices so that each unit of work can be computed independently by different worker processes (inter-process parallelism). The illustrative embodiments also provide an approach for decomposing the graph neighbor matching operations and the property projection operation into fine-grained configurable size tasks that can be processed independently by threads (intra-process parallelism) without the need for expensive synchronization primitives. For graph neighbor matching operations, a given set of source vertices is split into smaller tasks that are assigned to dedicated threads for processing. Each thread is responsible for computing a number of matching source vertices and propagating them to the next graph match operator for further processing. For property projection operations, the computed graph paths are organized into rows that contain the requested properties for each element of the path (vertices and/or edges).
Abstract:
Techniques support graph pattern matching queries inside a relational database management system (RDBMS) that supports SQL execution. The techniques compile a graph pattern matching query that includes a bounded recursive pattern query into a SQL query that can then be executed by the relational engine. As a result, techniques enable execution of graph pattern matching queries that include bounded recursive patterns on top of the relational engine by avoiding any change in the existing SQL engine.
Abstract:
A graph processing engine is provided for executing a graph query comprising a parent query and a subquery nested within the parent query. The subquery uses a reference to one or more correlated variables from the parent query. Executing the graph query comprises initiating execution of the parent query, pausing the execution of the parent query responsive to the parent query matching the one or more correlated variables in an intermediate result set, generating a subquery identifier for each match of the one or more correlated variables, modifying the subquery to include a subquery aggregate function and a clause to group results by subquery identifier, executing the modified subquery using the intermediate result set and collecting subquery results into a subquery results table responsive to pausing execution of the parent query, and resuming execution of the parent query using the subquery results table.