SIMULATED RISK CONTRIBUTION
    2.
    发明申请

    公开(公告)号:US20210049282A1

    公开(公告)日:2021-02-18

    申请号:US16991199

    申请日:2020-08-12

    IPC分类号: G06F21/57 G06F21/62

    摘要: Computing devices utilizing computer-readable media implement methods arranged for deriving risk contribution models from a dataset. Rather than inspect the entire data model in order to identify all quasi-identifying fields, the computing device develops a list of commonly-occurring but difficult-to-detect quasi-identifying fields. For each such field, the computing device creates a distribution of values/information values from other sources. Then, when risk measurement is performed, random simulated values (or information values) are selected for these fields. Quasi-identifying values are then selected for each field with multiplicity equal to the associated randomly-selected count. These are incorporated into the overall risk measurement and utilized in an anonymization process. In typical implementations, the overall average of re-identification risk measurement results prove to be generally consistent with the results which are obtained on the fully-classified data model.

    SYSTEM AND METHOD FOR INTERMEDIARY MAPPING AND DE-IDENTIFICATION OF NON-STANDARD DATASETS

    公开(公告)号:US20220129485A1

    公开(公告)日:2022-04-28

    申请号:US17505863

    申请日:2021-10-20

    摘要: Disclosed is a method for an intermediary mapping an de-identification comprising steps of retrieving datasets and meta data from a data source; selecting a target standard; mapping the retrieved datasets and the metadata to the target standard, wherein the datasets and the metadata are mapped to the target standard using one of, a schema mapping, a variable mapping, or a combination thereof; infer one or more of, variable classifications, variable connections, groupings, disclosure risk settings, and de-identification settings using the dataset mapping and metadata; perform a de-identification propagation using the mapped datasets, the mapped metadata, the inferred variable classifications, the inferred variable connections, the inferred groupings, the inferred disclosure risk settings, the inferred de-identification settings, or a combination thereof.