-
公开(公告)号:US20220383120A1
公开(公告)日:2022-12-01
申请号:US17827448
申请日:2022-05-27
Applicant: Google LLC
Inventor: Dara Bahri , Donald Arthur Metzler, JR. , Hanxi Heinrich Jiang , Yi Tay
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network having a plurality of network parameters. One of the methods includes obtaining an unlabeled training input from a set of unlabeled training data; processing the unlabeled training input to generate a first embedding; generating a corrupted version of the unlabeled training input, comprising determining a proper subset of the feature dimensions and, for each feature dimension that is in the proper subset of feature dimensions, applying a corruption to the respective feature in the feature dimension using one or more feature values sampled from a marginal distribution of the feature dimension as specified in the set of unlabeled training data; processing the corrupted version of the unlabeled training input to generate a second embedding; and determining an update to the current values of the plurality of network parameters.
-
公开(公告)号:US20220036203A1
公开(公告)日:2022-02-03
申请号:US17298766
申请日:2019-10-16
Applicant: Google LLC
Inventor: Ofir Nachum , Hanxi Heinrich Jiang
IPC: G06N5/02
Abstract: The present disclosure is directed to systems and methods for identifying and correcting label bias in machine learning via intelligent re-weighting of training examples. In particular, aspects of the present disclosure leverage a problem formulation which assumes the existence of underlying, unknown, and unbiased labels which are overwritten by an agent who intends to provide accurate labels but may have biases towards certain groups. Despite the fact that a biased training dataset provides only observations of the biased labels, the systems and methods described herein can nevertheless correct the bias by re-weighting the data points without changing the labels.
-