-
公开(公告)号:US20230237047A1
公开(公告)日:2023-07-27
申请号:US17585117
申请日:2022-01-26
Applicant: Oracle International Corporation
Inventor: Vasileios Trigonakis , Paul Renauld , Jinsu Lee , Petr Koupy , Sungpack Hong , Hassan Chafi
IPC: G06F16/23 , G06F16/901
CPC classification number: G06F16/2379 , G06F16/9024
Abstract: Data structures and methods are described for applying mutations on a distributed graph in a fast and memory-efficient manner. Nodes in a distributed graph processing system may store graph information such as vertices, edges, properties, vertex keys, vertex degree counts, and other information in graph arrays, which are divided into shared arrays and delta logs. The shared arrays on a local node remain immutable and are the starting point of a graph, on top of which mutations build new snapshots. Mutations may be supported at both the entity and table levels. Periodic delta log consolidation may occur at multiple levels to prevent excessive delta log buildup. Consolidation at the table level may also trigger rebalancing of vertices across the nodes.
-
公开(公告)号:US20230139718A1
公开(公告)日:2023-05-04
申请号:US17513760
申请日:2021-10-28
Applicant: Oracle International Corporation
Inventor: Mojtaba Valipour , Yasha Pushak , Robert Harlow , Hesam Fathi Moghadam , Sungpack Hong , Hassan Chafi
Abstract: Herein are acceleration and increased reliability based on classification and scoring techniques for machine learning that compare two similar datasets of different ages to detect data drift without a predefined drift threshold. Various subsets are randomly sampled from the datasets. The subsets are combined in various ways to generate subsets of various age mixtures. In an embodiment, ages are permuted and drift is detected based on whether or not fitness scores indicate that an age binary classifier is confused. In an embodiment, an anomaly detector measures outlier scores of two subsets of different age mixtures. Drift is detected when the outlier scores diverge. In a two-arm bandit embodiment, iterations randomly alternate between both datasets based on respective probabilities that are adjusted by a bandit reward based on outlier scores from an anomaly detector. Drift is detected based on the probability of the younger dataset.
-
公开(公告)号:US20230121198A1
公开(公告)日:2023-04-20
申请号:US18084406
申请日:2022-12-19
Applicant: Oracle International Corporation
Inventor: Petr Koupy , Thomas Manhardt , Siegfried Depner , Sungpack Hong , Hassan Chafi
Abstract: Techniques herein minimally communicate between computers to repartition a graph. In embodiments, each computer receives a partition of edges and vertices of the graph. For each of its edges or vertices, each computer stores an intermediate representation into an edge table (ET) or vertex table. Different edges of a vertex may be loaded by different computers, which may cause a conflict. Each computer announces that a vertex resides on the computer to a respective tracking computer. Each tracking computer makes assignments of vertices to computers and publicizes those assignments. Each computer that loaded conflicted vertices transfers those vertices to computers of the respective assignments. Each computer stores a materialized representation of a partition based on: the ET and vertex table of the computer, and the vertices and edges that were transferred to the computer. Edges stored in the materialized representation are stored differently than edges stored in the ET.
-
公开(公告)号:US11561780B2
公开(公告)日:2023-01-24
申请号:US17069104
申请日:2020-10-13
Applicant: Oracle International Corporation
Inventor: Petr Koupy , Thomas Manhardt , Siegfried Depner , Sungpack Hong , Hassan Chafi
IPC: G06F16/00 , G06F8/41 , G06F9/50 , G06F17/16 , G06F16/27 , G06F17/10 , G06F11/34 , G06F3/06 , G06F11/10 , G06F16/901
Abstract: Techniques herein minimally communicate between computers to repartition a graph. In embodiments, each computer receives a partition of edges and vertices of the graph. For each of its edges or vertices, each computer stores an intermediate representation into an edge table (ET) or vertex table. Different edges of a vertex may be loaded by different computers, which may cause a conflict. Each computer announces that a vertex resides on the computer to a respective tracking computer. Each tracking computer makes assignments of vertices to computers and publicizes those assignments. Each computer that loaded conflicted vertices transfers those vertices to computers of the respective assignments. Each computer stores a materialized representation of a partition based on: the ET and vertex table of the computer, and the vertices and edges that were transferred to the computer. Edges stored in the materialized representation are stored differently than edges stored in the ET.
-
公开(公告)号:US20220215055A1
公开(公告)日:2022-07-07
申请号:US17141018
申请日:2021-01-04
Applicant: Oracle International Corporation
Inventor: Arnaud Delamare , Vasileios Trigonakis , Yahya Ez-Zainabi , Sungpack Hong , Hassan Chafi
IPC: G06F16/901
Abstract: Techniques are provided for finding unused vertex and edge identifiers (IDs) in a distributed graph engine. A run-time data structure may be built during the loading of the graph. The data structure identifies unavailable IDs that are associated with graph entities of the graph. The data structure is traversed to determine one or more ranges of free IDs. Unused IDs are generated from the ranges.
-
公开(公告)号:US20220179859A1
公开(公告)日:2022-06-09
申请号:US17116831
申请日:2020-12-09
Applicant: Oracle International Corporation
Inventor: Tomas Faltin , Vasileios Trigonakis , Jean-Pierre Lozi , Sungpack Hong , Hassan Chafi
IPC: G06F16/2452 , G06F16/2455 , G06F16/248
Abstract: Systems and methods for improving evaluation of graph queries through depth first traversals are described herein. In an embodiment, a multi-node system evaluates against graph data a graph query that specifies a particular pattern to match by determining, at a first node of the multi-node system, in a particular instance of evaluating the graph query, that one or more first vertices on the first node match a first portion of the graph query and that a second vertex that is to be evaluated next is stored on a second node separate from the first node. In response to determining that the next vertex to be evaluated is stored on the second node separate from the first node, the first node generates a message to the second node comprising one or more results of the first portion of the graph query based on the one or more first vertices, an identifier of the next vertex, and a current stage of evaluating the graph query. In response to generating the message from the first node to the second node, the first node ceases the particular instance of evaluating the graph query.
-
公开(公告)号:US20220114178A1
公开(公告)日:2022-04-14
申请号:US17162564
申请日:2021-01-29
Applicant: Oracle International Corporation
Inventor: Vlad Ioan Haprian , Mikael Gonzalez Morales , Laurent Daynes , Zhen Hua Liu , Danica Porobic , Marco Arnaboldi , Jean-Pierre Lozi , Hugo Kapp , Sungpack Hong , Shasank Kisan Chavan , Hassan Chafi
IPC: G06F16/2455 , G06F16/26 , G06F16/2453 , G06F16/248 , G06F16/22
Abstract: Techniques are provided for processing a graph query by exploiting an in-memory graph index and minimizing the number of storage accesses needed to project properties of generated paths. A predefined number of paths from a graph query runtime is accumulated, using different data structures, before executing storage accesses necessary to retrieve all properties needed. A first data structure stores all paths from the graph query runtime that hit cache(s) entirely. A second data structure stores paths that do not hit caches at any level or only a subset of the levels does. Once any of these data structures are full, result rows are produced based on the two data structures prior to extracting more paths from the graph query runtime.
-
公开(公告)号:US11210224B2
公开(公告)日:2021-12-28
申请号:US16714111
申请日:2019-12-13
Applicant: Oracle International Corporation
Inventor: Sungpack Hong , Hassan Chafi , Eric Sedlar
IPC: G06F12/0837 , G06F12/0808
Abstract: Techniques are provided for performing a flush operation in a non-coherent cache. In response to determining to perform a flush operation, a cache unit flushes certain data items. The flush operation may be performed in response to a lapse of a particular amount of time, such as a number of cycles, or an explicit flush instruction that does not indicate any cache entry or data item. The cache unit may store change data that indicates which entry stores a data item that has been modified but not yet been flushed. The change data may be used to identify the entries that need to be flushed. In one technique, a dirty cache entry that is associated with one or more relatively recent changes is not flushed during a flush operation.
-
公开(公告)号:US11120082B2
公开(公告)日:2021-09-14
申请号:US15956115
申请日:2018-04-18
Applicant: Oracle International Corporation
Inventor: Damien Hilloulin , Davide Bartolini , Oskar Van Rest , Alexander Weld , Sungpack Hong , Hassan Chafi
IPC: G06F16/901 , G06F16/28 , G06F16/22
Abstract: Techniques are provided herein for efficient representation of heterogeneous graphs in memory. In an embodiment, vertices and edges of the graph are segregated by type. Each property of a type of vertex or edge has values stored in a respective vector. Directed or undirected edges of a same type are stored in compressed sparse row (CSR) format. The CSR format is more or less repeated for edge traversal in either forward or reverse direction. An edge map translates edge offsets obtained from traversal in the reverse direction for use with data structures that expect edge offsets in the forward direction. Subsequent filtration and/or traversal by type or property of vertex or edge entails minimal data access and maximal data locality, thereby increasing efficient use of the graph.
-
120.
公开(公告)号:US11093459B2
公开(公告)日:2021-08-17
申请号:US16747827
申请日:2020-01-21
Applicant: Oracle International Corporation
Inventor: Marco Arnaboldi , Jean-Pierre Lozi , Laurent Phillipe Daynes , Vlad Ioan Haprian , Shasank Kisan Chavan , Hugo Kapp , Sungpack Hong
Abstract: Herein are techniques that concurrently populate entries in a compressed sparse row (CSR) encoding, of a type of edge of a heterogenous graph. In an embodiment, a computer obtains a mapping of a relational schema to a graph data model. The relational schema defines vertex tables that correspond to vertex types in the graph data model, and edge tables that correspond to edge types in the graph data model. Each edge type is associated with a source vertex type and a target vertex type. For each vertex type, a sequence of persistent identifiers of vertices is obtained. Based on the mapping and for a CSR representation of each edge type, a source array is populated that, for a same vertex ordering as the sequence of persistent identifiers for the source vertex type, is based on counts of edges of the edge type that originate from vertices of the source vertex type. For the CSR, the computer populates, in parallel and based on said mapping, a destination array that contains canonical offsets as sequence positions within the sequence of persistent identifiers of the vertices.
-
-
-
-
-
-
-
-
-