SPACE-EFFICIENT METHODOLOGY FOR REPRESENTING LABEL INFORMATION IN LARGE GRAPH DATA FOR FAST DISTRIBUTED GRAPH QUERY

    公开(公告)号:US20200073868A1

    公开(公告)日:2020-03-05

    申请号:US16378424

    申请日:2019-04-08

    Abstract: Techniques are described herein for space-efficient encoding of label information of property graphs. In an embodiment, an input graph is received. The input graph comprises a plurality of entities and a plurality of label sets. Each entity of said plurality of entities is associated with a label set of the plurality of label sets and each label set of the plurality of label sets comprises zero or more labels of a plurality of labels. A first mapping is generated that maps each label of the plurality of labels to a label code. A second mapping is generated that maps each label integer set of a plurality of label integer sets to a label code. Each label integer set of the plurality of label integer sets corresponds to a label set of the plurality of label sets, wherein each label integer set of the plurality of label integer sets comprises label codes from the first mapping that are mapped to each label included in the corresponding label set. A compressed label set is generated for each entity of the plurality of entities. Each compressed label set comprises a plurality of bits that indicate a zeroth state, a first state, a second state, or a third state. The compressed label sets and the first and second mappings are used to efficiently evaluate graph label queries.

    Estimating Graph Size And Memory Consumption Of Distributed Graph For Efficient Resource Management

    公开(公告)号:US20250139163A1

    公开(公告)日:2025-05-01

    申请号:US18384248

    申请日:2023-10-26

    Abstract: An estimator is provided that can be used to get an estimate of final graph size and peak memory usage of the graph during loading, based on sampling of the graph data and using machine learning (ML) techniques. A data sampler samples the data from files or databases and estimates some statistics about the final graph. The sampler also samples some information about property data. Given the sampled statistics gathered and estimated by the data sampler, a graph size estimator estimates how much memory is required by the graph processing engine to load the graph. The final graph size represents how much memory will be used to keep the final graph structures in memory once loading is completed. The peak memory usage represents the memory usage upper bound that is reached by the graph processing engine during loading.

    SUBQUERIES IN DISTRIBUTED ASYNCHRONOUS GRAPH QUERIES

    公开(公告)号:US20240220496A1

    公开(公告)日:2024-07-04

    申请号:US18091249

    申请日:2022-12-29

    CPC classification number: G06F16/24535 G06F16/9024

    Abstract: A graph processing engine is provided for executing a graph query comprising a parent query and a subquery nested within the parent query. The subquery is an existential subquery, uses a reference to one or more correlated variables from the parent query, is inlined in the parent query pattern matching, does not have a post-processing phase, does not contain any global aggregation operations, uses a reference to at most one non-correlated variable, and does not include any filters on a non-correlated variable. Executing the graph query comprises initiating execution of the parent query, responsive to the parent query matching the one or more correlated variables in an intermediate result set, executing the subquery by applying a neighbor pattern matching operator that checks for existence of an edge, and resuming execution of the parent query based on results of the neighbor pattern matching operation.

    Deterministic semantic for graph property update queries and its efficient implementation

    公开(公告)号:US11928097B2

    公开(公告)日:2024-03-12

    申请号:US17479006

    申请日:2021-09-20

    CPC classification number: G06F16/2315 G06F11/0772 G06F16/2365

    Abstract: Efficiently implemented herein is a deterministic semantic for property updates by graph queries. Mechanisms of determinism herein ensure data consistency for graph mutation. These mechanisms facilitate optimistic execution of graph access despite a potential data access conflict. This approach may include various combinations of special activities such as detecting potential conflicts during query compile time, applying query transformations to eliminate those conflicts during code generation where possible, and executing updates in an optimistic way that safely fails if determinism cannot be guaranteed. In an embodiment, a computer receives a request to modify a graph. The request to modify the graph is optimistically executed after preparation and according to safety precautions as presented herein. Based on optimistically executing the request, a data access conflict actually occurs and is automatically detected. Based on the data access conflict, optimistically executing the request is prematurely and automatically halted without finishing executing the request.

    Optimizing graph queries by performing early pruning

    公开(公告)号:US11250059B2

    公开(公告)日:2022-02-15

    申请号:US16738972

    申请日:2020-01-09

    Abstract: Techniques are described herein for early pruning of potential graph query results. Specifically, based on determining that property values of a path through graph data cannot affect results of a query, the path is pruned from a set of potential query solutions prior to fully exploring the path. Early solution pruning is performed on prunable queries that project prunable functions including MIN, MAX, SUM, and DISTINCT, the results of which are not tied to a number of paths explored for query execution. A database system implements early solution pruning for a prunable query based on intermediate results maintained for the query during query execution. Specifically, when a system determines that property values of a given potential solution path cannot affect the query results reflected in intermediate results maintained for the query, the path is discarded from the set of possible query solutions without further exploration of the path.

    Fast distributed graph query engine

    公开(公告)号:US10990595B2

    公开(公告)日:2021-04-27

    申请号:US16274210

    申请日:2019-02-12

    Abstract: Techniques are described herein for asynchronous execution of queries on statically replicated graph data. In an embodiment, a graph is partitioned among a plurality of computers executing the graph querying engine. One or more high-degree vertices of the graph are each replicated in each graph partition. The partitions, including the replicated high-degree vertices, are loaded in memory of the plurality of computers. To execute a query, a query plan is generated based on the query. The query plan specifies a plurality of operators and an order for the plurality of operators. The order is such that if an operator requires data generated by another operator, then the other operator is ordered before the operator in the query plan. Replicated copies of a vertex is visited if matches made by subsequent operator(s) are limited by data unique to the replicated vertices.

    FAST DISTRIBUTED GRAPH QUERY ENGINE
    17.
    发明申请

    公开(公告)号:US20190354526A1

    公开(公告)日:2019-11-21

    申请号:US16274210

    申请日:2019-02-12

    Abstract: Techniques are described herein for asynchronous execution of queries on statically replicated graph data. In an embodiment, a graph is partitioned among a plurality of computers executing the graph querying engine. One or more high-degree vertices of the graph are each replicated in each graph partition. The partitions, including the replicated high-degree vertices, are loaded in memory of the plurality of computers. To execute a query, a query plan is generated based on the query. The query plan specifies a plurality of operators and an order for the plurality of operators. The order is such that if an operator requires data generated by another operator, then the other operator is ordered before the operator in the query plan. Replicated copies of a vertex is visited if matches made by subsequent operator(s) are limited by data unique to the replicated vertices.

    MULTI-STAGE PIPELINING FOR DISTRIBUTED GRAPH PROCESSING

    公开(公告)号:US20190342372A1

    公开(公告)日:2019-11-07

    申请号:US15968637

    申请日:2018-05-01

    Abstract: Techniques are described herein for evaluating graph processing tasks using a multi-stage pipelining communication mechanism. In a multi-node system comprising a plurality of nodes, each node of said plurality of nodes executing a respective communication agent object, wherein said respective communication agent object comprises: a sender lambda function is configured to: perform one or more sending operations, generate source messages based on the one or more sender operations, each source message of said source messages being marked for a particular node of said plurality of nodes. An intermediate lambda function is configured to: read source messages marked for said each node and sent to said each node, perform one or more intermediate operations based on the one or more source messages, generate intermediate messages based on the one or more intermediate operations, each intermediate message of said intermediate messages being marked for a particular node of said plurality of nodes. A final receiver lambda function configured to: read intermediate messages marked for said each node and sent to said each node, perform one or more final operations based on the one or more intermediate messages, generate a final result based on the one or more final operations. On each node of said plurality of nodes, the communication agent object is executed, wherein the communication agent object comprises executing said sender lambda function, said intermediate lambda function, and said final receiver lambda function.

    Scouting queries for improving query planning in distributed asynchronous graph queries

    公开(公告)号:US12174831B2

    公开(公告)日:2024-12-24

    申请号:US18073629

    申请日:2022-12-02

    Abstract: A graph processing system is provided for executing scouting queries for improving query planning. A query planner creates a plurality of scouting queries, each scouting query in the plurality of scouting queries corresponding to a query plan for a graph query and having an associated confidence value. A graph processing system performs limited execution of the plurality of scouting queries and determines a metric value for each scouting query in the plurality of scouting queries based on execution of the scouting query. The system determines a score for each scouting query in the plurality of scouting queries based on its metric value and the confidence value of the corresponding query plan and selects a query plan based on the scores of the plurality of scouting queries. The system executes the graph query based on the selected query plan.

Patent Agency Ranking