UNIFY95: META-LEARNING CONTAMINATION THRESHOLDS FROM UNIFIED ANOMALY SCORES

    公开(公告)号:US20240095580A1

    公开(公告)日:2024-03-21

    申请号:US17994530

    申请日:2022-11-28

    CPC classification number: G06N20/00

    Abstract: Herein is a universal anomaly threshold based on several labeled datasets and transformation of anomaly scores from one or more anomaly detectors. In an embodiment, a computer meta-learns from each anomaly detection algorithm and each labeled dataset as follows. A respective anomaly detector based on the anomaly detection algorithm is trained based on the dataset. The anomaly detector infers respective anomaly scores for tuples in the dataset. The following are ensured in the anomaly scores from the anomaly detector: i) regularity that an anomaly score of zero cannot indicate an anomaly and ii) normality that an inclusive range of zero to one contains the anomaly scores from the anomaly detector. A respective anomaly threshold is calculated for the anomaly scores from the anomaly detector. After all meta-learning, a universal anomaly threshold is calculated as an average of the anomaly thresholds. An anomaly is detected based on the universal anomaly threshold.

    AUTOMATED DATASET DRIFT DETECTION
    22.
    发明申请

    公开(公告)号:US20230139718A1

    公开(公告)日:2023-05-04

    申请号:US17513760

    申请日:2021-10-28

    Abstract: Herein are acceleration and increased reliability based on classification and scoring techniques for machine learning that compare two similar datasets of different ages to detect data drift without a predefined drift threshold. Various subsets are randomly sampled from the datasets. The subsets are combined in various ways to generate subsets of various age mixtures. In an embodiment, ages are permuted and drift is detected based on whether or not fitness scores indicate that an age binary classifier is confused. In an embodiment, an anomaly detector measures outlier scores of two subsets of different age mixtures. Drift is detected when the outlier scores diverge. In a two-arm bandit embodiment, iterations randomly alternate between both datasets based on respective probabilities that are adjusted by a bandit reward based on outlier scores from an anomaly detector. Drift is detected based on the probability of the younger dataset.

    MITIGATING BIAS IN MACHINE LEARNING WITHOUT POSITIVE OUTCOME RATE REGRESSIONS

    公开(公告)号:US20250139474A1

    公开(公告)日:2025-05-01

    申请号:US18544899

    申请日:2023-12-19

    Abstract: A computer obtains multipliers of a sensitive feature. From an input that contains a value of the feature, a probability of a class is inferred. Based on the value of the feature in the input, one of the multipliers of the feature is selected. The multiplier is specific to both of the feature and the value of the feature. The input is classified based on a multiplicative product of the probability of the class and the multiplier that is specific to both of the feature and the value of the feature. In an embodiment, a black-box tri-objective optimizer generates multipliers on a three-way Pareto frontier from which a user may interactively select a combination of multipliers that provides a best three-way tradeoff between fairness and accuracy. The optimizer has three objectives to respectively optimize three distinct validation metrics that may, for example, be accuracy, fairness, and favorable outcome rate decrease.

    ACCELERATING AUTOMATED ALGORITHM CONFIGURATION USING HISTORICAL PERFORMANCE DATA

    公开(公告)号:US20240394557A1

    公开(公告)日:2024-11-28

    申请号:US18202472

    申请日:2023-05-26

    Abstract: In an embodiment, a computer combines first original hyperparameters and second original hyperparameters into combined hyperparameters. In each iteration of a binary search that selects hyperparameters, these are selected: a) important hyperparameters from the combined hyperparameters and b) based on an estimated complexity decrease by including only important hyperparameters as compared to the combined hyperparameters, which only one boundary of the binary search to adjust. For the important hyperparameters of a last iteration of the binary search that selects hyperparameters, a pruned value range of a particular hyperparameter is generated based on a first original value range of the particular hyperparameter for the first original hyperparameters and a second original value range of the same particular hyperparameter for the second original hyperparameters. To accelerate hyperparameter optimization (HPO), the particular hyperparameter is tuned only within the pruned value range to discover an optimal value for configuring and training a machine learning model.

    FAST AND ACCURATE ANOMALY DETECTION EXPLANATIONS WITH FORWARD-BACKWARD FEATURE IMPORTANCE

    公开(公告)号:US20230376366A1

    公开(公告)日:2023-11-23

    申请号:US17992743

    申请日:2022-11-22

    CPC classification number: G06F11/006 G06N20/00 G06F2201/82

    Abstract: The present invention relates to machine learning (ML) explainability (MLX). Herein are local explanation techniques for black box ML models based on coalitions of features in a dataset. In an embodiment, a computer receives a request to generate a local explanation of which coalitions of features caused an anomaly detector to detect an anomaly. During unsupervised generation of a new coalition, a first feature is randomly selected from features in a dataset. Which additional features in the dataset can join the coalition, because they have mutual information with the first feature that exceeds a threshold, is detected. For each feature that is not in the coalition, values of the feature are permuted in imperfect copies of original tuples in the dataset. An average anomaly score of the imperfect copies is measured. Based on the average anomaly score of the imperfect copies, a local explanation is generated that references (e.g. defines) the coalition.

    ONE-PASS APPROACH TO AUTOMATED TIMESERIES FORECASTING

    公开(公告)号:US20230153394A1

    公开(公告)日:2023-05-18

    申请号:US17528305

    申请日:2021-11-17

    Abstract: Herein are timeseries preprocessing, model selection, and hyperparameter tuning techniques for forecasting development based on temporal statistics of a timeseries and a single feed-forward pass through a machine learning (ML) pipeline. In an embodiment, a computer hosts and operates the ML pipeline that automatically measures temporal statistic(s) of a timeseries. ML algorithm selection, cross validation, and hyperparameters tuning is based on the temporal statistics of the timeseries. The result from the ML pipeline is a rigorously trained and production ready ML model that is validated to have increased accuracy for multiple prediction horizons. Based on the temporal statistics, efficiency is achieved by asymmetry of investment of computer resources in the tuning and training of the most promising ML algorithm(s). Compared to other approaches, this ML pipeline produces a more accurate ML model for a given amount of computer resources and consumes fewer computer resources to achieve a given accuracy.

Patent Agency Ranking