Abstract:
Embodiments of the invention relate to executing graph path queries. A database stores data entities and attributes in node tables and stores links between nodes in an edge table. Edges form a path between a source node and a target node. A source node set is generated and joined with the edge table to produce a first intermediate set. Similarly, a target node set is generated and joined with the edge table to produce a second intermediate set. A result path is generated through a joining of the first and second intermediate paths and application of a length condition.
Abstract:
A computer determines social media influencers in a specific topic. The computer receives a dataset of information on a website, the information including a list of users of the website and a list of content that each user posts, wherein each user is associated with one or more other users. The computer identifies a plurality of variables associated with the dataset, wherein the plurality of variables represent the information of the dataset on the website. The computer executes a topic specific search based on the plurality of variables, the topic search providing at least another list of users representing influencers in a specific topic.
Abstract:
Embodiments relate to subgraph-based distributed graph processing. An aspect includes receiving an input graph comprising a plurality of vertices. Another aspect includes partitioning the input graph into a plurality of subgraphs, each subgraph comprising internal vertices and boundary vertices. Another aspect includes assigning one or more respective subgraphs to each of a plurality of workers. Another aspect includes initiating processing of the plurality of subgraphs by performing a series of processing steps comprising: processing the internal vertices and boundary vertices internally within each of the subgraphs; detecting that a change was made to a boundary vertex of a first subgraph during the internal processing; and sending a message from a first worker to which the first subgraph is assigned to a second worker to which a second subgraph is assigned in response to detecting the change that was made to the boundary vertex of the first subgraph.
Abstract:
Embodiments of the invention relate to sparsity-driven matrix representation. In one embodiment, a sparsity of a matrix is determined and the sparsity is compared to a threshold. Computer memory is allocated to store the matrix in a first data structure format based on the sparsity being greater than the threshold. Computer memory is allocated to store the matrix in a second data structure format based on the sparsity not being greater than the threshold.
Abstract:
Hybrid parallelization strategies for machine learning programs on top of MapReduce are provided. In one embodiment, a method of and computer program product for parallel execution of machine learning programs are provided. Program code is received. The program code contains at least one parallel for statement having a plurality of iterations. A parallel execution plan is determined for the program code. According to the parallel execution plan, the plurality of iterations is partitioned into a plurality of tasks. Each task comprises at least one iteration. The iterations of each task are independent.