MULTIPLIER TUNING POSTPROCESSING FOR MACHINE LEARNING BIAS MITIGATION

    公开(公告)号:US20240403674A1

    公开(公告)日:2024-12-05

    申请号:US18529300

    申请日:2023-12-05

    Abstract: In an embodiment, a computer infers, from an input (e.g. that represents a person) that contains a value of a sensitive feature that has a plurality of multipliers, a probability of a majority class (i.e. an outcome). Based on the value of the sensitive feature in the input, from the multipliers of the sensitive feature, a multiplier is selected that is specific to both of the sensitive feature and the value of the sensitive feature. The input is classified based on a multiplicative product of the probability of the majority class and the multiplier that is specific to both of the sensitive feature and the value of the sensitive feature. In an embodiment, a black-box bi-objective optimizer generates multipliers on a Pareto frontier from which a user may interactively select a combination of multipliers that provide a best tradeoff between fairness and accuracy.

    THRESHOLD TUNING FOR IMBALANCED MULTI-CLASS CLASSIFICATION MODELS

    公开(公告)号:US20240303541A1

    公开(公告)日:2024-09-12

    申请号:US18386196

    申请日:2023-11-01

    CPC classification number: G06N20/00 G06N7/01

    Abstract: In an embodiment, a computer generates, from an input, an inference that contains multiple probabilities respectively for multiple mutually exclusive classes that contain a first class and a second class. The probabilities contain (e.g. due to overfitting) a higher probability for the first class that is higher than a lower probability for the second class. In response to a threshold exceeding the higher probability, the input is automatically and more accurately classified as the second class. One, some, or almost all classes may have a respective distinct threshold that can be concurrently applied for acceleration. Data parallelism may simultaneously apply a threshold to a batch of multiple inputs for acceleration.

    AUTOMLX COUNTERFACTUAL EXPLAINER (ACE)
    104.
    发明公开

    公开(公告)号:US20240303515A1

    公开(公告)日:2024-09-12

    申请号:US18512438

    申请日:2023-11-17

    CPC classification number: G06N5/04

    Abstract: A computer stores a reference corpus that consists of many reference points that each has a respective class. Later, an expected class and a subject point (i.e. instance to explain) that does not have the expected class are received. Multiple reference points that have the expected class are selected as starting points. Based on the subject point and the starting points, multiple discrete interpolated points are generated that have the expected class. Based on the subject point and the discrete interpolated points, multiple continuous interpolated points are generated that have the expected class. A counterfactual explanation of why the subject point does not have the expected class is directly generated based on continuous interpolated point(s) and, thus, indirectly generated based on the discrete interpolated points. For acceleration, neither way of interpolation (i.e. counterfactual generation) is iterative. Generated interpolated points can be reused to amortize resources consumed while generating counterfactuals.

    Fast and accurate anomaly detection explanations with forward-backward feature importance

    公开(公告)号:US11966275B2

    公开(公告)日:2024-04-23

    申请号:US17992743

    申请日:2022-11-22

    CPC classification number: G06F11/006 G06N20/00 G06F2201/82

    Abstract: The present invention relates to machine learning (ML) explainability (MLX). Herein are local explanation techniques for black box ML models based on coalitions of features in a dataset. In an embodiment, a computer receives a request to generate a local explanation of which coalitions of features caused an anomaly detector to detect an anomaly. During unsupervised generation of a new coalition, a first feature is randomly selected from features in a dataset. Which additional features in the dataset can join the coalition, because they have mutual information with the first feature that exceeds a threshold, is detected. For each feature that is not in the coalition, values of the feature are permuted in imperfect copies of original tuples in the dataset. An average anomaly score of the imperfect copies is measured. Based on the average anomaly score of the imperfect copies, a local explanation is generated that references (e.g. defines) the coalition.

    Access-frequency-based entity replication techniques for distributed property graphs with schema

    公开(公告)号:US11907255B2

    公开(公告)日:2024-02-20

    申请号:US17686938

    申请日:2022-03-04

    CPC classification number: G06F16/27 G06F16/2282 G06F16/284

    Abstract: In an embodiment, multiple computers cooperate to retrieve content from tables in a relational database. Each table contains respective rows. Each row contains a vertex of a graph. Many high-degree vertices are identified. Each high-degree vertex is connected to respective edges in the graph. A count of the edges of each high-degree vertex exceeds a degree threshold. A central computer detects that all vertices in a high-degree subset of tables are high-degree vertices. Based on detecting the high-degree subset of tables, multiple vertices of the graph that are not in the high-degree subset of tables are replicated. Within local storage capacity limits of the computers, this degree-based replication may be supplemented with other vertex replication strategies that are schema based, content based, or workload based. This intelligent selective replication maximizes system throughput by minimizing graph data access latency based on data locality.

    ACCESS-FREQUENCY-BASED ENTITY REPLICATION TECHNIQUES FOR DISTRIBUTED PROPERTY GRAPHS WITH SCHEMA

    公开(公告)号:US20230281219A1

    公开(公告)日:2023-09-07

    申请号:US17686938

    申请日:2022-03-04

    CPC classification number: G06F16/27 G06F16/284 G06F16/2282

    Abstract: In an embodiment, multiple computers cooperate to retrieve content from tables in a relational database. Each table contains respective rows. Each row contains a vertex of a graph. Many high-degree vertices are identified. Each high-degree vertex is connected to respective edges in the graph. A count of the edges of each high-degree vertex exceeds a degree threshold. A central computer detects that all vertices in a high-degree subset of tables are high-degree vertices. Based on detecting the high-degree subset of tables, multiple vertices of the graph that are not in the high-degree subset of tables are replicated. Within local storage capacity limits of the computers, this degree-based replication may be supplemented with other vertex replication strategies that are schema based, content based, or workload based. This intelligent selective replication maximizes system throughput by minimizing graph data access latency based on data locality.

Patent Agency Ranking