MIXED NOISE MECHANISM FOR DATA ANONYMIZATION

    公开(公告)号:US20230100347A1

    公开(公告)日:2023-03-30

    申请号:US17956928

    申请日:2022-09-30

    IPC分类号: G06F21/62 G06F21/64

    摘要: A method includes collecting one or more datasets of information. The method also includes separating the one or more datasets into respective blocks of data. The method further includes determining whether the information within the blocks of data are consistent, or if one or more violations occur within the blocks of data. In addition, the method includes applying a first noise function based on the determination that the information within the blocks of data are consistent, wherein the first noise function is applied when a loss of privacy and/or confidentiality exceeds a threshold. The method also includes displaying the blocks of data with the first noise function.

    DETERMINING JOURNALIST RISK OF A DATASET USING POPULATION EQUIVALENCE CLASS DISTRIBUTION ESTIMATION

    公开(公告)号:US20220115101A1

    公开(公告)日:2022-04-14

    申请号:US17560350

    申请日:2021-12-23

    IPC分类号: G16H10/60

    摘要: Methods and systems to de-identify a longitudinal dataset of personal records based on journalistic risk computed from a sample set of the personal records, including determining a similarity distribution of the sample set based on quasi-identifiers of the respective personal records, converting the similarity distribution of the sample set to an equivalence class distribution, and computing journalistic risk based on the equivalence distribution. In an embodiment, multiple similarity measures are determined for a personal record based on comparisons with multiple combinations of other personal records of the sample set, and an average of the multiple similarity measures is rounded. In an embodiment, similarity measures are determined for a subset of the sample set and, for each similarity measure, the number of records having the similarity measure is projected to the subset of personal records. Journalistic risk may be computed for multiple types of attacks.

    Re-identification risk prediction

    公开(公告)号:US10380381B2

    公开(公告)日:2019-08-13

    申请号:US15401221

    申请日:2017-01-09

    IPC分类号: G06F21/62 H04L29/06

    摘要: System and method to predict risk of re-identification of a cohort if the cohort is anonymized using a de-identification strategy. An input anonymity histogram and de-identification strategy is used to predict the anonymity histogram that would result from applying the de-identification strategy to the dataset. System embodiments compute a risk of re-identification from the predicted anonymity histogram.

    Smart de-identification using date jittering

    公开(公告)号:US10318763B2

    公开(公告)日:2019-06-11

    申请号:US15385710

    申请日:2016-12-20

    IPC分类号: H04L29/06 G06F21/62

    摘要: System and method to produce an anonymized cohort having less than a predetermined risk of re-identification. The method includes receiving a data query of requested traits for the anonymized cohort, querying a data source to find records that possess at least some of the traits, forming a dataset from at least some of the records, and grouping the dataset in time into a first boundary group, a second boundary group, and one or more non-boundary groups temporally between the first boundary group and second boundary group. For each non-boundary group, calculating maximum time limits the non-boundary group can be time-shifted without overlapping an adjacent group, calculating a group jitter amount, capping the group jitter amount by the maximum time limits and by respective predetermined jitter limits, and jittering said non-boundary group by the capped group jitter amount to produce an anonymized dataset. Return the anonymized dataset.

    DETERMINING JOURNALIST RISK OF A DATASET USING POPULATION EQUIVALENCE CLASS DISTRIBUTION ESTIMATION

    公开(公告)号:US20230307104A1

    公开(公告)日:2023-09-28

    申请号:US18324453

    申请日:2023-05-26

    IPC分类号: G16H10/60

    CPC分类号: G16H10/60

    摘要: Methods and systems to de-identify a longitudinal dataset of personal records based on journalistic risk computed from a sample set of the personal records, including determining a similarity distribution of the sample set based on quasi-identifiers of the respective personal records, converting the similarity distribution of the sample set to an equivalence class distribution, and computing journalistic risk based on the equivalence distribution. In an embodiment, multiple similarity measures are determined for a personal record based on comparisons with multiple combinations of other personal records of the sample set, and an average of the multiple similarity measures is rounded. In an embodiment, similarity measures are determined for a subset of the sample set and, for each similarity measure, the number of records having the similarity measure is projected to the subset of personal records. Journalistic risk may be computed for multiple types of attacks.

    SMART DE-IDENTIFICATION USING DATE JITTERING

    公开(公告)号:US20220253559A1

    公开(公告)日:2022-08-11

    申请号:US17730592

    申请日:2022-04-27

    IPC分类号: G06F21/62

    摘要: System and method to produce an anonymized cohort having less than a predetermined risk of re-identification. The method includes receiving a data query of requested traits for the anonymized cohort, querying a data source to find records that possess at least some of the traits, forming a dataset from at least some of the records, and grouping the dataset in time into a first boundary group, a second boundary group, and one or more non-boundary groups temporally between the first boundary group and second boundary group. For each non-boundary group, calculating maximum time limits the non-boundary group can be time-shifted without overlapping an adjacent group, calculating a group jitter amount, capping the group jitter amount by the maximum time limits and by respective predetermined jitter limits, and jittering said non-boundary group by the capped group jitter amount to produce an anonymized dataset. Return the anonymized dataset.

    Smart de-identification using date jittering

    公开(公告)号:US11334685B2

    公开(公告)日:2022-05-17

    申请号:US16802045

    申请日:2020-02-26

    IPC分类号: G06F21/00 G06F21/62

    摘要: System and method to produce an anonymized cohort having less than a predetermined risk of re-identification. The method includes receiving a data query of requested traits for the anonymized cohort, querying a data source to find records that possess at least some of the traits, forming a dataset from at least some of the records, and grouping the dataset in time into a first boundary group, a second boundary group, and one or more non-boundary groups temporally between the first boundary group and second boundary group. For each non-boundary group, calculating maximum time limits the non-boundary group can be time-shifted without overlapping an adjacent group, calculating a group jitter amount, capping the group jitter amount by the maximum time limits and by respective predetermined jitter limits, and jittering said non-boundary group by the capped group jitter amount to produce an anonymized dataset. Return the anonymized dataset.