Fingerprint-based data classification

    公开(公告)号:US11886468B2

    公开(公告)日:2024-01-30

    申请号:US17541704

    申请日:2021-12-03

    CPC classification number: G06F16/285 G06F16/221 G06F16/2264 G06N20/00

    Abstract: Systems and methods are provided for automated classification of data using fingerprints. In embodiments, a method includes: generating, by a computing device based on predetermined rules, a fingerprint of a data column in a data set to be classified, the fingerprint comprising dimensions, wherein each of the dimension is assigned an attribute representing a characteristic of data in the data column; determining, by the computing device, that the fingerprint matches one or more target fingerprints by comparing the fingerprint to the target fingerprints, wherein each target fingerprint is associated with a class and includes dimensions, and each dimension is assigned an attribute representing a characteristic of data in the class; and assigning, by the computing device, one or more classes to the data column based on the one or more target fingerprints, thereby generating classified data.

    PREVIEW DATA LINEAGE RELATIONSHIP TO REDUCE ETL

    公开(公告)号:US20240320234A1

    公开(公告)日:2024-09-26

    申请号:US18125882

    申请日:2023-03-24

    CPC classification number: G06F16/254 G06F16/215

    Abstract: An approach is disclosed that receives a new ETL job. The job includes a number of intermediate database files descriptors corresponding to a plurality of intermediate database files that are used to accomplish the new ETL. A new data lineage graph is created that pertains to the new ETL job. The new data lineage graph is compared to a number of existing data lineage graphs with each of the existing data lineage graphs corresponding to an existing ETL job. The approach substitutes existing database files found in the existing data lineage graphs for one or more intermediate database files found in the new data lineage graph. The new ETL job is then run by utilizing the substituted database files, the result being a new final database file.

    New Data Class Generation Based on Static Reference Data

    公开(公告)号:US20240386032A1

    公开(公告)日:2024-11-21

    申请号:US18317475

    申请日:2023-05-15

    Abstract: New data class generation is provided. A dimension score is generated for each respective dimension of a plurality of predefined dimensions as relating to column attributes of a data asset while performing a static reference data analysis of the data asset. The dimension score of each respective dimension is added together to obtain a total dimension score for the data asset. It is determined whether the total dimension score of the data asset is greater than a predefined minimum dimension score threshold level. The data asset is identified as new static reference data in response to determining that the total dimension score of the data asset is greater than the predefined minimum dimension score threshold level. A new data class is generated based on the new static reference data.

    Mutual Exclusion Data Class Analysis in Data Governance

    公开(公告)号:US20230297596A1

    公开(公告)日:2023-09-21

    申请号:US17654858

    申请日:2022-03-15

    CPC classification number: G06F16/285 G06F16/221

    Abstract: Performing a mutual exclusion data class analysis is provided. A data class group of a plurality of data class groups that a matching data class is a member of is identified. The matching data class matches data in a plurality of rows of a column in a data asset. Data classes included in the data class group that the matching data class is a member of are identified. A mutual exclusion data class is filtered from the data class group to form a filtered data class group for the column. The filtered data class group is run against the column of the data asset decreasing processing time and resource utilization of a computer.

Patent Agency Ranking