Enforcing fairness on unlabeled data to improve modeling performance

    公开(公告)号:US11775863B2

    公开(公告)日:2023-10-03

    申请号:US16781945

    申请日:2020-02-04

    CPC classification number: G06N20/00 G06N3/088

    Abstract: Fairness of a trained classifier may be ensured by generating a data set for training, the data set generated using input data points of a feature space including multiple dimensions and according to different parameters including an amount of label bias, a control for discrepancy between rarity of features, and an amount of selection bias. Unlabeled data points of the input data comprising unobserved ground truths are labeled according to the amount of label bias and the input data sampled according to the amount of selection bias and the control for the discrepancy between the rarity of features. The classifier is then trained using the sampled and labeled data points as well as additional unlabeled data points. The trained classifier is then usable to determine unbiased classifications of one or more labels for one or more other data sets.

    EVALUATING LANGUAGE MODELS USING NEGATIVE DATA

    公开(公告)号:US20210375262A1

    公开(公告)日:2021-12-02

    申请号:US16890263

    申请日:2020-06-02

    Abstract: A method of evaluating a language model using negative data may include accessing a first language model that is trained using a first training corpus, and accessing a second language model. The second language model may be configured to generate outputs that are less grammatical than outputs generated by the first language model. The method may also include training the second language model using a second training corpus, and generating output text from the second language model. The method may further include testing the first language model using the output text from the second language model.

Patent Agency Ranking