-
公开(公告)号:US11983171B2
公开(公告)日:2024-05-14
申请号:US18219333
申请日:2023-07-07
Applicant: Xerox Corporation
Inventor: Matthew Shreve , Francisco E. Torres , Raja Bala , Robert R. Price , Pei Li
CPC classification number: G06F16/2379 , G06N20/00
Abstract: A method of labeling a dataset includes inputting a testing set comprising a plurality of input data samples into a plurality of pre-trained machine learning models to generate a set of embeddings output by the plurality of pre-trained machine learning models. The method further includes performing an iterative cluster labeling algorithm that includes generating a plurality of clusterings from the set of embeddings, analyzing the plurality of clusterings to identify a target embedding with a highest duster quality, analyzing the target embedding to determine a compactness for each of the plurality of clusterings of the target embedding, and identifying a target cluster among the plurality of clusterings of the target embedding based on the compactness. The method further includes assigning pseudo-labels to the subset of the plurality of input data samples that are members of the target duster.
-
公开(公告)号:US20230350880A1
公开(公告)日:2023-11-02
申请号:US18219333
申请日:2023-07-07
Applicant: Xerox Corporation
Inventor: Matthew Shreve , Francisco E. Torres , Raja Bala , Robert R. Price , Pei Li
CPC classification number: G06F16/2379 , G06N20/00
Abstract: A method of labeling a dataset includes inputting a testing set comprising a plurality of input data samples into a plurality of pre-trained machine learning models to generate a set of embeddings output by the plurality of pre-trained machine learning models. The method further includes performing an iterative cluster labeling algorithm that includes generating a plurality of clusterings from the set of embeddings, analyzing the plurality of clusterings to identify a target embedding with a highest duster quality, analyzing the target embedding to determine a compactness for each of the plurality of clusterings of the target embedding, and identifying a target cluster among the plurality of clusterings of the target embedding based on the compactness. The method further includes assigning pseudo-labels to the subset of the plurality of input data samples that are members of the target duster.
-
公开(公告)号:US20240281431A1
公开(公告)日:2024-08-22
申请号:US18647425
申请日:2024-04-26
Applicant: Xerox Corporation
Inventor: Matthew Shreve , Francisco E. Torres , Raja Bala , Robert R. Price , Pei Li
CPC classification number: G06F16/2379 , G06N20/00
Abstract: A method of labeling training data includes inputting a plurality of unlabeled input data samples into each of a plurality of pre-trained neural networks and extracting a set of feature embeddings from multiple layer depths of each of the plurality of pre-trained neural networks. The method also includes generating a plurality of clusterings from the set of feature embeddings. The method also includes analyzing, by a processing device, the plurality of clusterings to identify a subset of the plurality of unlabeled input data samples that belong to a same unknown class. The method also includes assigning pseudo-labels to the subset of the plurality of unlabeled input data samples.
-
-