FAIRNESS FEATURE IMPORTANCE: UNDERSTANDING AND MITIGATING UNJUSTIFIABLE BIAS IN MACHINE LEARNING MODELS

    公开(公告)号:US20250094862A1

    公开(公告)日:2025-03-20

    申请号:US18529182

    申请日:2023-12-05

    Abstract: In an embodiment, a computer generates a respective original inference from each of many records. Permuted values are selected for a feature from original values of the feature. Based on the permuted values for the feature, a permuted inference is generated from each record. Fairness and accuracy of the original and permuted inferences are measured. For each of many features, the computer measures a respective impact on fairness of a machine learning model, and a respective impact on accuracy of the machine learning model. A global explanation of the machine learning model is generated and presented based on, for multiple features, the impacts on fairness and accuracy. Based on the global explanation, an interactive indication to exclude or include a particular feature is received. The machine learning model is (re-)trained based on the interactive indication to exclude or include the particular feature, which may increase the fairness of the model.

    FLOATING POINT UNIT WITH SUPPORT FOR VARIABLE LENGTH NUMBERS
    4.
    发明申请
    FLOATING POINT UNIT WITH SUPPORT FOR VARIABLE LENGTH NUMBERS 有权
    浮动点单元支持可变长度编号

    公开(公告)号:US20150254065A1

    公开(公告)日:2015-09-10

    申请号:US14198746

    申请日:2014-03-06

    Abstract: Embodiments of a processor are disclosed for performing arithmetic operations on a machine independent number format. The processor may include a floating point unit, and a number unit. The number format may include a sign/exponent block, a length block, and multiple mantissa digits. The number unit may be configured to perform an operation on two operands by converting the digit format of each mantissa digit of each operand, to perform the operation using the converted mantissa digits, and then to convert each mantissa digit of the result of the operation back into the original digit format.

    Abstract translation: 公开了一种用于对机器独立数字格式执行算术运算的处理器的实施例。 处理器可以包括浮点单元和数字单元。 数字格式可以包括符号/指数块,长度块和多个尾数。 数字单元可以被配置为通过转换每个操作数的每个尾数数字的数字格式来执行对两个操作数的操作,以使用转换的尾数来执行操作,然后将该操作结果的每个尾数数字转换回 成为原始数字格式。

    LEARNING HYPER-PARAMETER SCALING MODELS FOR UNSUPERVISED ANOMALY DETECTION

    公开(公告)号:US20240095604A1

    公开(公告)日:2024-03-21

    申请号:US18075784

    申请日:2022-12-06

    CPC classification number: G06N20/20

    Abstract: A computer sorts empirical validation scores of validated training scenarios of an anomaly detector. Each training scenario has a dataset to train an instance of the anomaly detector that is configured with values for hyperparameters. Each dataset has values for metafeatures. For each predefined ranking percentage, a subset of best training scenarios is selected that consists of the ranking percentage of validated training scenarios having the highest empirical validation scores. Linear optimizers train to infer a value for a hyperparameter. Into many distinct unvalidated training scenarios, a scenario is generated that has metafeatures values and hyperparameters values that contains the value inferred for that hyperparameter by a linear optimizer. For each unvalidated training scenario, a validation score is inferred. A best linear optimizer is selected having a highest combined inferred validation score. For a new dataset, the best linear optimizer infers a value of that hyperparameter.

    CHROMOSOME REPRESENTATION LEARNING IN EVOLUTIONARY OPTIMIZATION TO EXPLOIT THE STRUCTURE OF ALGORITHM CONFIGURATION

    公开(公告)号:US20240070471A1

    公开(公告)日:2024-02-29

    申请号:US17900779

    申请日:2022-08-31

    CPC classification number: G06N3/126

    Abstract: Principal component analysis (PCA) accelerates and increases accuracy of genetic algorithms. In an embodiment, a computer generates many original chromosomes. Each original chromosome contains a sequence of original values. Each position in the sequences in the original chromosomes corresponds to only one respective distinct parameter in a set of parameters to be optimized. Based on the original chromosomes, many virtual chromosomes are generated. Each virtual chromosome contains a sequence of numeric values. Positions in the sequences in the virtual chromosomes do not correspond to only one respective distinct parameter in the set of parameters to be optimized. Based on the virtual chromosomes, many new chromosomes are generated. Each new chromosome contains a sequence of values. Each position in the sequences in the new chromosomes corresponds to only one respective distinct parameter in the set of parameters to be optimized. The computer may be configured based on a best new chromosome.

    Automatic feature subset selection based on meta-learning

    公开(公告)号:US11615265B2

    公开(公告)日:2023-03-28

    申请号:US16547312

    申请日:2019-08-21

    Abstract: The present invention relates to dimensionality reduction for machine learning (ML) models. Herein are techniques that individually rank features and combine features based on their rank to achieve an optimal combination of features that may accelerate training and/or inferencing, prevent overfitting, and/or provide insights into somewhat mysterious datasets. In an embodiment, a computer ranks features of datasets of a training corpus. For each dataset and for each landmark percentage, a target ML model is configured to receive only a highest ranking landmark percentage of features, and a landmark accuracy achieved by training the ML model with the dataset is measured. Based on the landmark accuracies and meta-features values of the dataset, a respective training tuple is generated for each dataset. Based on all of the training tuples, a regressor is trained to predict an optimal amount of features for training the target ML model.

    Expert-optimal correlation: contamination factor identification for unsupervised anomaly detection

    公开(公告)号:US12299553B2

    公开(公告)日:2025-05-13

    申请号:US18075824

    申请日:2022-12-06

    Abstract: In a computer, each of multiple anomaly detectors infers an anomaly score for each of many tuples. For each tuple, a synthetic label is generated that indicates for each anomaly detector: the anomaly detector, the anomaly score inferred by the anomaly detector for the tuple and, for each of multiple contamination factors, the contamination factor and, based on the contamination factor, a binary class of the anomaly score. For each particular anomaly detector excluding a best anomaly detector, a similarity score is measured for each contamination factor. The similarity score indicates how similar, between the particular anomaly detector and the best anomaly detector, are the binary classes of labels with that contamination factor. For each contamination factor, a combined similarity score is calculated based on the similarity scores for the contamination factor. Based on a contamination factor that has the highest combined similarity score, the computer detects that an additional anomaly detector is inaccurate.

Patent Agency Ranking