-
公开(公告)号:US20250068979A1
公开(公告)日:2025-02-27
申请号:US18942116
申请日:2024-11-08
Applicant: Oracle International Corporation
Abstract: Fairness of a trained classifier may be ensured by generating a data set for training, the data set generated using input data points of a feature space including multiple dimensions and according to different parameters including an amount of label bias, a control for discrepancy between rarity of features, and an amount of selection bias. Unlabeled data points of the input data comprising unobserved ground truths are labeled according to the amount of label bias and the input data sampled according to the amount of selection bias and the control for the discrepancy between the rarity of features. The classifier is then trained using the sampled and labeled data points as well as additional 10 unlabeled data points. The trained classifier is then usable to determine unbiased classifications of one or more labels for one or more other data sets.
-
公开(公告)号:US11921687B2
公开(公告)日:2024-03-05
申请号:US16436770
申请日:2019-06-10
Applicant: Oracle International Corporation
IPC: G06F16/22 , G06F17/18 , G06F18/22 , G06F18/231
CPC classification number: G06F16/2228 , G06F17/18 , G06F18/22 , G06F18/231
Abstract: A first set and a second set are identified as operands for a set operation of a similarity analysis task iteration. Using respective minimum hash information arrays and contributor count arrays of the two sets, a minimum hash information array and contributor count array of a derived set resulting from the set operation is generated. An entry in the contributor count array of the derived set indicates the number of child sets of the derived set that meet a criterion with respect to a corresponding entry in the minimum hash information array of the derived set. The generated minimum hash information array and the contributor count array are stored as part of input for a subsequent iteration. After a termination criterion of the task is met, output of the task is stored.
-
公开(公告)号:US11416500B2
公开(公告)日:2022-08-16
申请号:US16781961
申请日:2020-02-04
Applicant: Oracle International Corporation
IPC: G06F16/2457 , G06N20/00 , G06N20/20 , G06F17/18 , G06K9/62
Abstract: A Bayesian test of demographic parity for learning to rank may be applied to determine ranking modifications. A fairness control system receiving a ranking of items may apply Bayes factors to determine a likelihood of bias for the ranking. These Bayes factors may include a factor for determining bias in each item and a factor for determining bias in the ranking of the items. An indicator of bias may be generated using the applied Bayes factors and the fairness control system may modify the ranking if the determines likelihood of bias satisfies modification criteria for the ranking.
-
公开(公告)号:US20200372406A1
公开(公告)日:2020-11-26
申请号:US16781945
申请日:2020-02-04
Applicant: Oracle International Corporation
IPC: G06N20/00
Abstract: Fairness of a trained classifier may be ensured by generating a data set for training, the data set generated using input data points of a feature space including multiple dimensions and according to different parameters including an amount of label bias, a control for discrepancy between rarity of features, and an amount of selection bias. Unlabeled data points of the input data comprising unobserved ground truths are labeled according to the amount of label bias and the input data sampled according to the amount of selection bias and the control for the discrepancy between the rarity of features. The classifier is then trained using the sampled and labeled data points as well as additional unlabeled data points. The trained classifier is then usable to determine unbiased classifications of one or more labels for one or more other data sets.
-
公开(公告)号:US12175344B2
公开(公告)日:2024-12-24
申请号:US18453929
申请日:2023-08-22
Applicant: Oracle International Corporation
Abstract: Fairness of a trained classifier may be ensured by generating a data set for training, the data set generated using input data points of a feature space including multiple dimensions and according to different parameters including an amount of label bias, a control for discrepancy between rarity of features, and an amount of selection bias. Unlabeled data points of the input data comprising unobserved ground truths are labeled according to the amount of label bias and the input data sampled according to the amount of selection bias and the control for the discrepancy between the rarity of features. The classifier is then trained using the sampled and labeled data points as well as additional unlabeled data points. The trained classifier is then usable to determine unbiased classifications of one or more labels for one or more other data sets.
-
公开(公告)号:US20240419900A1
公开(公告)日:2024-12-19
申请号:US18817147
申请日:2024-08-27
Applicant: Oracle International Corporation
Inventor: Swetasudha Panda , Ariel Kobren , Michael Louis Wick , Stephen Green
IPC: G06F40/279 , G06N20/00
Abstract: Debiasing pre-trained sentence encoders with probabilistic dropouts may be performed by various systems, services, or applications. A sentence may be received, where the words of the sentence may be provided as tokens to an encoder of a machine learning model. A token-wise correlation using semantic orientation may be determined to determine a bias score for the tokens in the input sentence. A probability of dropout that for tokens in the input sentence may be determined from the bias scores. The machine learning model may be trained or tuned based on the probabilities of dropout for the tokens in the input sentence.
-
公开(公告)号:US20230032208A1
公开(公告)日:2023-02-02
申请号:US17389900
申请日:2021-07-30
Applicant: Oracle International Corporation
Inventor: Ariel Gedaliah Kobren , Naveen Jafer Nizar , Michael Louis Wick , Swetasudha Panda
IPC: G06N20/00 , G06K9/62 , G06F40/247
Abstract: Techniques are disclosed for augmenting data sets used for training machine learning models and for generating predictions by trained machine learning models. These techniques may increase a number (and diversity) of examples within an initial training dataset of sentences by extracting a subset of words from the existing training dataset of sentences. The extracted subset includes no stopwords and fewer content words than found in the initial training dataset. The remaining words may be re-ordered. Using the extracted and re-ordered subset of words, the dataset generation model produces a second set of sentences that are different from the first set. The second set of sentences may be used to increase a number of examples in classes with few examples.
-
公开(公告)号:US20220382768A1
公开(公告)日:2022-12-01
申请号:US17819611
申请日:2022-08-12
Applicant: Oracle International Corporation
IPC: G06F16/2457 , G06N20/00 , G06N20/20 , G06F17/18 , G06K9/62 , G06V20/20 , G02B27/01 , G06T19/00 , G09G3/00
Abstract: A Bayesian test of demographic parity for learning to rank may be applied to determine ranking modifications. A fairness control system receiving a ranking of items may apply Bayes factors to determine a likelihood of bias for the ranking. These Bayes factors may include a factor for determining bias in each item and a factor for determining bias in the ranking of the items. An indicator of bias may be generated using the applied Bayes factors and the fairness control system may modify the ranking if the determines likelihood of bias satisfies modification criteria for the ranking.
-
公开(公告)号:US20220245339A1
公开(公告)日:2022-08-04
申请号:US17589662
申请日:2022-01-31
Applicant: Oracle International Corporation
Inventor: Swetasudha Panda , Ariel Kobren , Michael Louis Wick , Stephen Green
IPC: G06F40/279 , G06N20/00
Abstract: Debiasing pre-trained sentence encoders with probabilistic dropouts may be performed by various systems, services, or applications. A sentence may be received, where the words of the sentence may be provided as tokens to an encoder of a machine learning model. A token-wise correlation using semantic orientation may be determined to determine a bias score for the tokens in the input sentence. A probability of dropout that for tokens in the input sentence may be determined from the bias scores. The machine learning model may be trained or tuned based on the probabilities of dropout for the tokens in the input sentence.
-
公开(公告)号:US10410139B2
公开(公告)日:2019-09-10
申请号:US15168309
申请日:2016-05-31
Applicant: ORACLE INTERNATIONAL CORPORATION
Inventor: Pallika Haridas Kanani , Michael Louis Wick , Katherine Silverstein
Abstract: A system that performs natural language processing receives a text corpus that includes a plurality of documents and receives a knowledge base. The system generates a set of document n-grams from the text corpus and considers all n-grams as candidate mentions. The system, for each candidate mention, queries the knowledge base and in response retrieves results. From the results retrieved by the queries, the system generates a search space and generates a joint model from the search space.
-
-
-
-
-
-
-
-
-