ASYMMETRIC JOURNALIST RISK MODEL OF DATA RE-IDENTIFICATION

    公开(公告)号:US20170083719A1

    公开(公告)日:2017-03-23

    申请号:US15271664

    申请日:2016-09-21

    IPC分类号: G06F21/62 G06F17/30 G06F21/55

    摘要: System and method to produce an anonymized cohort, members of the cohort having less than a predetermined risk of re-identification. The system includes a user-facing communication interface to receive an anonymized cohort request comprising traits to include in members of the cohort; a data source-facing communication channel to query a data source, to find anonymized records that possess at least some of the requested traits; and a processor programmed to carry out the instructions of: forming a dataset from at least some of the anonymized records; calculating a risk of re-identification of the anonymized records in the dataset based upon the data query; perturbing anonymized records in the dataset that exceed a predetermined risk of re-identification, until the risk of re-identification is not greater than the pre-determined threshold, to produce the anonymized cohort; and providing, via a user-facing communication channel, the anonymized cohort.

    MIXED NOISE MECHANISM FOR DATA ANONYMIZATION

    公开(公告)号:US20230100347A1

    公开(公告)日:2023-03-30

    申请号:US17956928

    申请日:2022-09-30

    IPC分类号: G06F21/62 G06F21/64

    摘要: A method includes collecting one or more datasets of information. The method also includes separating the one or more datasets into respective blocks of data. The method further includes determining whether the information within the blocks of data are consistent, or if one or more violations occur within the blocks of data. In addition, the method includes applying a first noise function based on the determination that the information within the blocks of data are consistent, wherein the first noise function is applied when a loss of privacy and/or confidentiality exceeds a threshold. The method also includes displaying the blocks of data with the first noise function.

    DETERMINING JOURNALIST RISK OF A DATASET USING POPULATION EQUIVALENCE CLASS DISTRIBUTION ESTIMATION

    公开(公告)号:US20220115101A1

    公开(公告)日:2022-04-14

    申请号:US17560350

    申请日:2021-12-23

    IPC分类号: G16H10/60

    摘要: Methods and systems to de-identify a longitudinal dataset of personal records based on journalistic risk computed from a sample set of the personal records, including determining a similarity distribution of the sample set based on quasi-identifiers of the respective personal records, converting the similarity distribution of the sample set to an equivalence class distribution, and computing journalistic risk based on the equivalence distribution. In an embodiment, multiple similarity measures are determined for a personal record based on comparisons with multiple combinations of other personal records of the sample set, and an average of the multiple similarity measures is rounded. In an embodiment, similarity measures are determined for a subset of the sample set and, for each similarity measure, the number of records having the similarity measure is projected to the subset of personal records. Journalistic risk may be computed for multiple types of attacks.

    Re-identification risk prediction

    公开(公告)号:US10380381B2

    公开(公告)日:2019-08-13

    申请号:US15401221

    申请日:2017-01-09

    IPC分类号: G06F21/62 H04L29/06

    摘要: System and method to predict risk of re-identification of a cohort if the cohort is anonymized using a de-identification strategy. An input anonymity histogram and de-identification strategy is used to predict the anonymity histogram that would result from applying the de-identification strategy to the dataset. System embodiments compute a risk of re-identification from the predicted anonymity histogram.

    Smart de-identification using date jittering

    公开(公告)号:US10318763B2

    公开(公告)日:2019-06-11

    申请号:US15385710

    申请日:2016-12-20

    IPC分类号: H04L29/06 G06F21/62

    摘要: System and method to produce an anonymized cohort having less than a predetermined risk of re-identification. The method includes receiving a data query of requested traits for the anonymized cohort, querying a data source to find records that possess at least some of the traits, forming a dataset from at least some of the records, and grouping the dataset in time into a first boundary group, a second boundary group, and one or more non-boundary groups temporally between the first boundary group and second boundary group. For each non-boundary group, calculating maximum time limits the non-boundary group can be time-shifted without overlapping an adjacent group, calculating a group jitter amount, capping the group jitter amount by the maximum time limits and by respective predetermined jitter limits, and jittering said non-boundary group by the capped group jitter amount to produce an anonymized dataset. Return the anonymized dataset.

    SMART DE-IDENTIFICATION USING DATE JITTERING

    公开(公告)号:US20180173893A1

    公开(公告)日:2018-06-21

    申请号:US15385710

    申请日:2016-12-20

    IPC分类号: G06F21/62

    CPC分类号: G06F21/6254

    摘要: System and method to produce an anonymized cohort having less than a predetermined risk of re-identification. The method includes receiving a data query of requested traits for the anonymized cohort, querying a data source to find records that possess at least some of the traits, forming a dataset from at least some of the records, and grouping the dataset in time into a first boundary group, a second boundary group, and one or more non-boundary groups temporally between the first boundary group and second boundary group. For each non-boundary group, calculating maximum time limits the non-boundary group can be time-shifted without overlapping an adjacent group, calculating a group jitter amount, capping the group jitter amount by the maximum time limits and by respective predetermined jitter limits, and jittering said non-boundary group by the capped group jitter amount to produce an anonymized dataset. Return the anonymized dataset.