SYSTEMS, METHODS, AND NON-TRANSITORY COMPUTER-READABLE STORAGE DEVICES FOR DETECTING AND ANALYZING DATA CLONES IN TABULAR DATASETS

    公开(公告)号:US20240152578A1

    公开(公告)日:2024-05-09

    申请号:US18386023

    申请日:2023-11-01

    CPC classification number: G06F18/22 G06T11/206

    Abstract: A computerized method for detecting and analyzing data clones in one or more dataset pairs has the steps of: obtaining one or more similarity matrices and one or more sets of readout values of the one or more similarity matrices from the dataset pairs using a data-clone detection method, each set of readout values corresponding to a similarity matrix; obtaining one or more importance values for the one or more similarity matrices by processing the one or more sets of readout values using an interpretation method, each importance value corresponding to a similarity matrix; obtaining one or more weighted similarity matrices by weighting each similarity matrix using the corresponding importance value; and obtaining one or more summed similarity matrices by grouping and summing the weighted similarity matrices according to one or more categories for providing a result with indications of locations of the data clones in the dataset pairs.

Patent Agency Ranking