Invalid traffic detection using explainable unsupervised graph ML

    公开(公告)号:US12184692B2

    公开(公告)日:2024-12-31

    申请号:US17558342

    申请日:2021-12-21

    Abstract: Herein are graph machine learning explainability (MLX) techniques for invalid traffic detection. In an embodiment, a computer generates a graph that contains: a) domain vertices that represent network domains that received requests and b) address vertices that respectively represent network addresses from which the requests originated. Based on the graph, domain embeddings are generated that respectively encode the domain vertices. Based on the domain embeddings, multidomain embeddings are generated that respectively encode the network addresses. The multidomain embeddings are organized into multiple clusters of multidomain embeddings. A particular cluster is detected as suspicious. In an embodiment, an unsupervised trained graph model generates the multidomain embeddings. Based on the clusters of multidomain embeddings, feature importances are unsupervised trained. Based on the feature importances, an explanation is automatically generated for why an object is or is not suspicious. The explained object may be a cluster or other batch of network addresses or a single network address.

    SUBQUERIES IN DISTRIBUTED ASYNCHRONOUS GRAPH QUERIES

    公开(公告)号:US20240220495A1

    公开(公告)日:2024-07-04

    申请号:US18091242

    申请日:2022-12-29

    CPC classification number: G06F16/24535 G06F16/24537 G06F16/9024

    Abstract: A graph processing engine is provided for executing a graph query comprising a parent query and a subquery nested within the parent query. The subquery uses a reference to one or more correlated variables from the parent query. Executing the graph query comprises initiating execution of the parent query, pausing the execution of the parent query responsive to the parent query matching the one or more correlated variables in an intermediate result set, generating a subquery identifier for each match of the one or more correlated variables, modifying the subquery to include a subquery aggregate function and a clause to group results by subquery identifier, executing the modified subquery using the intermediate result set and collecting subquery results into a subquery results table responsive to pausing execution of the parent query, and resuming execution of the parent query using the subquery results table.

    Duplication elimination in depth based searches for distributed systems

    公开(公告)号:US12001425B2

    公开(公告)日:2024-06-04

    申请号:US17116831

    申请日:2020-12-09

    CPC classification number: G06F16/24526 G06F16/24556 G06F16/248

    Abstract: Systems and methods for improving evaluation of graph queries through depth first traversals are described herein. In an embodiment, a multi-node system evaluates against graph data a graph query that specifies a particular pattern to match by determining, at a first node of the multi-node system, in a particular instance of evaluating the graph query, that one or more first vertices on the first node match a first portion of the graph query and that a second vertex that is to be evaluated next is stored on a second node separate from the first node. In response to determining that the next vertex to be evaluated is stored on the second node separate from the first node, the first node generates a message to the second node comprising one or more results of the first portion of the graph query based on the one or more first vertices, an identifier of the next vertex, and a current stage of evaluating the graph query. In response to generating the message from the first node to the second node, the first node ceases the particular instance of evaluating the graph query.

    LEARNING HYPER-PARAMETER SCALING MODELS FOR UNSUPERVISED ANOMALY DETECTION

    公开(公告)号:US20240095604A1

    公开(公告)日:2024-03-21

    申请号:US18075784

    申请日:2022-12-06

    CPC classification number: G06N20/20

    Abstract: A computer sorts empirical validation scores of validated training scenarios of an anomaly detector. Each training scenario has a dataset to train an instance of the anomaly detector that is configured with values for hyperparameters. Each dataset has values for metafeatures. For each predefined ranking percentage, a subset of best training scenarios is selected that consists of the ranking percentage of validated training scenarios having the highest empirical validation scores. Linear optimizers train to infer a value for a hyperparameter. Into many distinct unvalidated training scenarios, a scenario is generated that has metafeatures values and hyperparameters values that contains the value inferred for that hyperparameter by a linear optimizer. For each unvalidated training scenario, a validation score is inferred. A best linear optimizer is selected having a highest combined inferred validation score. For a new dataset, the best linear optimizer infers a value of that hyperparameter.

    CHROMOSOME REPRESENTATION LEARNING IN EVOLUTIONARY OPTIMIZATION TO EXPLOIT THE STRUCTURE OF ALGORITHM CONFIGURATION

    公开(公告)号:US20240070471A1

    公开(公告)日:2024-02-29

    申请号:US17900779

    申请日:2022-08-31

    CPC classification number: G06N3/126

    Abstract: Principal component analysis (PCA) accelerates and increases accuracy of genetic algorithms. In an embodiment, a computer generates many original chromosomes. Each original chromosome contains a sequence of original values. Each position in the sequences in the original chromosomes corresponds to only one respective distinct parameter in a set of parameters to be optimized. Based on the original chromosomes, many virtual chromosomes are generated. Each virtual chromosome contains a sequence of numeric values. Positions in the sequences in the virtual chromosomes do not correspond to only one respective distinct parameter in the set of parameters to be optimized. Based on the virtual chromosomes, many new chromosomes are generated. Each new chromosome contains a sequence of values. Each position in the sequences in the new chromosomes corresponds to only one respective distinct parameter in the set of parameters to be optimized. The computer may be configured based on a best new chromosome.

    INVALID TRAFFIC DETECTION USING EXPLAINABLE UNSUPERVISED GRAPH ML

    公开(公告)号:US20230199026A1

    公开(公告)日:2023-06-22

    申请号:US17558342

    申请日:2021-12-21

    CPC classification number: H04L63/1483 H04L63/1425 G06N20/00

    Abstract: Herein are graph machine learning explainability (MLX) techniques for invalid traffic detection. In an embodiment, a computer generates a graph that contains: a) domain vertices that represent network domains that received requests and b) address vertices that respectively represent network addresses from which the requests originated. Based on the graph, domain embeddings are generated that respectively encode the domain vertices. Based on the domain embeddings, multidomain embeddings are generated that respectively encode the network addresses. The multidomain embeddings are organized into multiple clusters of multidomain embeddings. A particular cluster is detected as suspicious. In an embodiment, an unsupervised trained graph model generates the multidomain embeddings. Based on the clusters of multidomain embeddings, feature importances are unsupervised trained. Based on the feature importances, an explanation is automatically generated for why an object is or is not suspicious. The explained object may be a cluster or other batch of network addresses or a single network address.

    Dynamic asynchronous traversals for distributed graph queries

    公开(公告)号:US11675785B2

    公开(公告)日:2023-06-13

    申请号:US16778668

    申请日:2020-01-31

    CPC classification number: G06F16/24526 G06F16/2471

    Abstract: Techniques are described for enabling in-memory execution of any-sized graph data query by utilizing both depth first search (DFS) principles and breadth first search (BFS) principles to control the amount of memory used during query execution. Specifically, threads implementing a graph DBMS switch between a BFS mode of data traversal and a DFS mode of data traversal. For example, when a thread detects that there are less than a configurable threshold number of intermediate results in memory, the thread enters BFS-based traversal techniques to increase the number of intermediate results in memory. When the thread detects that there are at least the configurable threshold number of intermediate results in memory, the thread enters DFS mode to produce final results, which generally works to move the intermediate results that are currently available in memory to final query results, thereby reducing the number of intermediate results in memory.

Patent Agency Ranking