Distributed graph processing system that adopts a faster data loading technique that requires low degree of communication
摘要:
Techniques minimize communication while loading a graph. In a distributed embodiment, each computer loads some edges of the graph. Each edge connects a source vertex (SV) to a destination vertex. For each SV of the edges, the computer hashes the SV to detect a tracking computer (TrC) that tracks on which computer does the SV reside. Each computer informs the TrC that the SV originates an edge that resides on that computer. For each SV, the TrC detects that the SV originates edges that reside on multiple providing computers (PCs). The TrC selects a target computer (TaC) from the multiple PCs to host the SV. The TrC instructs each PC, excluding the TaC, to transfer the SV and related edges that are connected to the SV to the TaC. A vertex's internal identifier indicates which computer hosts the vertex. The TrC maintains a mapping between external and internal identifiers.
信息查询
0/0