Graph-data partitioning for workload-balanced distributed computation with cost estimation functions
    141.
    发明授权
    Graph-data partitioning for workload-balanced distributed computation with cost estimation functions 有权
    用于具有成本估算功能的工作负载均衡分布式计算的图形数据分区

    公开(公告)号:US09477532B1

    公开(公告)日:2016-10-25

    申请号:US14876075

    申请日:2015-10-06

    CPC classification number: G06F9/5083 G06F9/4881 G06F2209/5022

    Abstract: Techniques herein perform workload-balanced graph partitioning. Each graph partition is distributed to a respective computer. Each computer applies a workload-estimation function to its partition to calculate a numeric workload-value that indicates how much computation the partition needs. Each computer sends its numeric workload-value to a master computer. The master compares the highest and lowest numeric workload-values. If the difference exceeds a threshold, the master detects how much work should overloaded-computers offload to under-utilized computers. To each overloaded-computer, the master sends a directive with a balancing numeric workload-value that indicates how much computation to offload and an identifier of an under-utilized computer to receive the offload. Based on this directive and the workload-estimation function, an overloaded-computer selects a portion of its partition that corresponds to the balancing numeric workload-value, removes that portion from its partition, and transfers the portion to the under-utilized computer, which adds the portion to its partition.

    Abstract translation: 这里的技术执行工作负载平衡图分割。 每个图形分区都分配给相应的计算机。 每个计算机将其工作负载估计功能应用于其分区,以计算数字工作负载值,该值指示分区需要多少计算。 每个计算机将其数字工作负载值发送到主计算机。 主人比较最高和最低数值工作负载值。 如果差异超过阈值,则主机检测到应该超载多少工作 - 计算机卸载到未充分利用的计算机。 对于每台重载计算机,主机发送一个指令,其中包含一个平衡数字工作负载值,指示卸载多少计算和一个未充分利用的计算机的标识符来接收卸载。 基于该指令和工作负载估计功能,重载计算机选择其对应于平衡数字工作负载值的分区的一部分,从其分区中移除该部分,并将该部分传送到未充分利用的计算机,其中 将该部分添加到其分区。

    DISTRIBUTED GRAPH PROCESSING SYSTEM THAT SUPPORT REMOTE DATA READ WITH PROACTIVE BULK DATA TRANSFER
    142.
    发明申请
    DISTRIBUTED GRAPH PROCESSING SYSTEM THAT SUPPORT REMOTE DATA READ WITH PROACTIVE BULK DATA TRANSFER 审中-公开
    分布式图形处理系统,支持远程数据读取与主动大容量数据传输

    公开(公告)号:US20160292303A1

    公开(公告)日:2016-10-06

    申请号:US14678358

    申请日:2015-04-03

    CPC classification number: G06F16/9024 G06F16/254

    Abstract: Techniques for generating and transferring bulk messages from one computing device to another computing device in a cluster are provided. Each computing device in a cluster is assigned a different set of nodes of a graph. A first computing device may be assigned a particular node that is neighbors with multiple other nodes that are assigned to one or more other computing devices in the cluster. When processing graph-related code at the first computing device, information about the neighbors may be required. The first computing device receives a bulk message from one of the other computing devices. The bulk message contains information about at least a subset of the neighbors. Therefore, the first computing device is not required to send multiple messages for information about the subset of neighbors. In fact, the first computing device is not required to send any message for the information.

    Abstract translation: 提供了用于从批量消息从一个计算设备到群集中的另一计算设备的生成和传送的技术。 集群中的每个计算设备都分配了一组不同图形的节点。 可以向第一计算设备分配与分配给集群中的一个或多个其他计算设备的多个其他节点相邻的特定节点。 当在第一计算设备处理图形相关代码时,可能需要有关邻居的信息。 第一计算设备从其他计算设备之一接收批量消息。 批量消息包含有关邻居的至少一个子集的信息。 因此,第一计算设备不需要发送关于邻居子集的信息的多个消息。 事实上,第一个计算设备不需要发送任何消息的信息。

    SNAPSHOT-CONSISTENT, IN-MEMORY GRAPH INSTANCES IN A MULTI-USER DATABASE
    143.
    发明申请
    SNAPSHOT-CONSISTENT, IN-MEMORY GRAPH INSTANCES IN A MULTI-USER DATABASE 有权
    SNAPSHOT一致性,多用户数据库中的内存图形实例

    公开(公告)号:US20160019228A1

    公开(公告)日:2016-01-21

    申请号:US14332182

    申请日:2014-07-15

    CPC classification number: G06F17/30958 G06F17/30327 G06F2201/80

    Abstract: Techniques for storing and processing graph data in a database system are provided. Graph data (or a portion thereof) that is stored in persistent storage is loaded into memory to generate an instance of a particular graph. The instance is consistent as of a particular point in time. Graph analysis operations are performed on the instance. The instance may be used by multiple users to perform graph analysis operations. Subsequent changes to the graph are stored separate from the instance. Later, the changes may be applied to the instance (or a copy thereof) to refresh the instance.

    Abstract translation: 提供了用于在数据库系统中存储和处理图形数据的技术。 将存储在永久存储器中的图形数据(或其一部分)加载到存储器中以生成特定图形的实例。 实例在特定时间点是一致的。 在实例上执行图形分析操作。 多个用户可以使用该实例来执行图形分析操作。 图形的后续更改与实例分开存储。 之后,可以将更改应用于实例(或其副本)以刷新实例。

    Expert-optimal correlation: contamination factor identification for unsupervised anomaly detection

    公开(公告)号:US12299553B2

    公开(公告)日:2025-05-13

    申请号:US18075824

    申请日:2022-12-06

    Abstract: In a computer, each of multiple anomaly detectors infers an anomaly score for each of many tuples. For each tuple, a synthetic label is generated that indicates for each anomaly detector: the anomaly detector, the anomaly score inferred by the anomaly detector for the tuple and, for each of multiple contamination factors, the contamination factor and, based on the contamination factor, a binary class of the anomaly score. For each particular anomaly detector excluding a best anomaly detector, a similarity score is measured for each contamination factor. The similarity score indicates how similar, between the particular anomaly detector and the best anomaly detector, are the binary classes of labels with that contamination factor. For each contamination factor, a combined similarity score is calculated based on the similarity scores for the contamination factor. Based on a contamination factor that has the highest combined similarity score, the computer detects that an additional anomaly detector is inaccurate.

    Estimating Graph Size And Memory Consumption Of Distributed Graph For Efficient Resource Management

    公开(公告)号:US20250139163A1

    公开(公告)日:2025-05-01

    申请号:US18384248

    申请日:2023-10-26

    Abstract: An estimator is provided that can be used to get an estimate of final graph size and peak memory usage of the graph during loading, based on sampling of the graph data and using machine learning (ML) techniques. A data sampler samples the data from files or databases and estimates some statistics about the final graph. The sampler also samples some information about property data. Given the sampled statistics gathered and estimated by the data sampler, a graph size estimator estimates how much memory is required by the graph processing engine to load the graph. The final graph size represents how much memory will be used to keep the final graph structures in memory once loading is completed. The peak memory usage represents the memory usage upper bound that is reached by the graph processing engine during loading.

    INVALID TRAFFIC DETECTION USING EXPLAINABLE UNSUPERVISED GRAPH ML

    公开(公告)号:US20250119453A1

    公开(公告)日:2025-04-10

    申请号:US18954031

    申请日:2024-11-20

    Abstract: Herein are graph machine learning explainability (MLX) techniques for invalid traffic detection. In an embodiment, a computer generates a graph that contains: a) domain vertices that represent network domains that received requests and b) address vertices that respectively represent network addresses from which the requests originated. Based on the graph, domain embeddings are generated that respectively encode the domain vertices. Based on the domain embeddings, multidomain embeddings are generated that respectively encode the network addresses. The multidomain embeddings are organized into multiple clusters of multidomain embeddings. A particular cluster is detected as suspicious. In an embodiment, an unsupervised trained graph model generates the multidomain embeddings. Based on the clusters of multidomain embeddings, feature importances are unsupervised trained. Based on the feature importances, an explanation is automatically generated for why an object is or is not suspicious. The explained object may be a cluster or other batch of network addresses or a single network address.

    SUPERVISED MODEL SELECTION VIA DIVERSITY CRITERIA

    公开(公告)号:US20250077876A1

    公开(公告)日:2025-03-06

    申请号:US18239416

    申请日:2023-08-29

    Abstract: Techniques for selecting machine-learned (ML) models using diversity criteria are provided. In one technique, for each ML model of multiple ML models, output data is generated based on input data to the ML model. Multiple pairs of ML models are identified, where each ML model in the multiple pairs is from the multiple ML models. For each pair of ML models in the multiple pairs of ML models: (1) first output data that was previously generated by a first ML model in the pair is identified; (2) second output data that was previously generated by a second ML model in the pair is identified; (3) a diversity value that is based on the first and second output data is generated; and (4) the diversity value is added to a set of diversity values. A subset of the multiple ML models is selected based on the set of diversity values.

    SIMULTANEOUS DATA SAMPLING AND FEATURE SELECTION VIA WEAK LEARNERS

    公开(公告)号:US20250013909A1

    公开(公告)日:2025-01-09

    申请号:US18218970

    申请日:2023-07-06

    Abstract: From many features and many multidimensional points, a computer generates exploratory training configurations. Each point contains a value for each of the features. Each exploratory training configuration identifies a random subset of the features and a random subset of the points. A performance score is generated for each of the exploratory training configurations. A feature weight is generated for each of the features that is based on the performance scores of the exploratory training configurations whose random subset of features contains the feature. A point weight is generated for each of the points that is based on the performance scores of the exploratory training configurations whose random subset of the many points contains the point. A machine learning model is trained using an optimized training corpus that consists of a subset of the many features based on feature weight and a subset of the many points based on point weight.

    SUBQUERIES IN DISTRIBUTED ASYNCHRONOUS GRAPH QUERIES

    公开(公告)号:US20240220496A1

    公开(公告)日:2024-07-04

    申请号:US18091249

    申请日:2022-12-29

    CPC classification number: G06F16/24535 G06F16/9024

    Abstract: A graph processing engine is provided for executing a graph query comprising a parent query and a subquery nested within the parent query. The subquery is an existential subquery, uses a reference to one or more correlated variables from the parent query, is inlined in the parent query pattern matching, does not have a post-processing phase, does not contain any global aggregation operations, uses a reference to at most one non-correlated variable, and does not include any filters on a non-correlated variable. Executing the graph query comprises initiating execution of the parent query, responsive to the parent query matching the one or more correlated variables in an intermediate result set, executing the subquery by applying a neighbor pattern matching operator that checks for existence of an edge, and resuming execution of the parent query based on results of the neighbor pattern matching operation.

    EXPERT-OPTIMAL CORRELATION: CONTAMINATION FACTOR IDENTIFICATION FOR UNSUPERVISED ANOMALY DETECTION

    公开(公告)号:US20240095231A1

    公开(公告)日:2024-03-21

    申请号:US18075824

    申请日:2022-12-06

    CPC classification number: G06F16/2365

    Abstract: In a computer, each of multiple anomaly detectors infers an anomaly score for each of many tuples. For each tuple, a synthetic label is generated that indicates for each anomaly detector: the anomaly detector, the anomaly score inferred by the anomaly detector for the tuple and, for each of multiple contamination factors, the contamination factor and, based on the contamination factor, a binary class of the anomaly score. For each particular anomaly detector excluding a best anomaly detector, a similarity score is measured for each contamination factor. The similarity score indicates how similar, between the particular anomaly detector and the best anomaly detector, are the binary classes of labels with that contamination factor. For each contamination factor, a combined similarity score is calculated based on the similarity scores for the contamination factor. Based on a contamination factor that has the highest combined similarity score, the computer detects that an additional anomaly detector is inaccurate.

Patent Agency Ranking