-
1.
公开(公告)号:US20240037123A1
公开(公告)日:2024-02-01
申请号:US18484018
申请日:2023-10-10
IPC分类号: G06F16/28 , G06N5/04 , G06F16/21 , G06F18/213 , G06F18/21
CPC分类号: G06F16/285 , G06F16/288 , G06N5/04 , G06F16/211 , G06F18/213 , G06F18/217
摘要: Disclosed is a method for an intermediary mapping an de-identification comprising steps of retrieving datasets and meta data from a data source; selecting a target standard; mapping the retrieved datasets and the metadata to the target standard, wherein the datasets and the metadata are mapped to the target standard using one of, a schema mapping, a variable mapping, or a combination thereof; infer one or more of, variable classifications, variable connections, groupings, disclosure risk settings, and de-identification settings using the dataset mapping and metadata; perform a de-identification propagation using the mapped datasets, the mapped metadata, the inferred variable classifications, the inferred variable connections, the inferred groupings, the inferred disclosure risk settings, the inferred de-identification settings, or a combination thereof.
-
2.
公开(公告)号:US10803201B1
公开(公告)日:2020-10-13
申请号:US15904465
申请日:2018-02-26
摘要: System and method to produce an anonymized electronic data product having an individually-determined threshold of re-identification risk, and adjusting re-identification risk measurement parameters based on individual characteristics such as geographic location, in order to provide an anonymized electronic data product having a sensitivity-based reduced risk of re-identification.
-
公开(公告)号:US20230237196A1
公开(公告)日:2023-07-27
申请号:US18128938
申请日:2023-03-30
CPC分类号: G06F21/6254 , G06N20/00 , G06F2221/2107
摘要: A data anonymization pipeline system for managing holding and pooling data is disclosed. The data anonymization pipeline system transforms personal data at a source and then stores the transformed data in a safe environment. Furthermore, a re-identification risk assessment is performed before providing access to a user to fetch the de-identified data for secondary purposes.
-
公开(公告)号:US20230100347A1
公开(公告)日:2023-03-30
申请号:US17956928
申请日:2022-09-30
摘要: A method includes collecting one or more datasets of information. The method also includes separating the one or more datasets into respective blocks of data. The method further includes determining whether the information within the blocks of data are consistent, or if one or more violations occur within the blocks of data. In addition, the method includes applying a first noise function based on the determination that the information within the blocks of data are consistent, wherein the first noise function is applied when a loss of privacy and/or confidentiality exceeds a threshold. The method also includes displaying the blocks of data with the first noise function.
-
5.
公开(公告)号:US20220115101A1
公开(公告)日:2022-04-14
申请号:US17560350
申请日:2021-12-23
发明人: Stephen Korte , Luk Arbuckle , Andrew Baker , Khaled El Emam , Sean Rose
IPC分类号: G16H10/60
摘要: Methods and systems to de-identify a longitudinal dataset of personal records based on journalistic risk computed from a sample set of the personal records, including determining a similarity distribution of the sample set based on quasi-identifiers of the respective personal records, converting the similarity distribution of the sample set to an equivalence class distribution, and computing journalistic risk based on the equivalence distribution. In an embodiment, multiple similarity measures are determined for a personal record based on comparisons with multiple combinations of other personal records of the sample set, and an average of the multiple similarity measures is rounded. In an embodiment, similarity measures are determined for a subset of the sample set and, for each similarity measure, the number of records having the similarity measure is projected to the subset of personal records. Journalistic risk may be computed for multiple types of attacks.
-
公开(公告)号:US10380381B2
公开(公告)日:2019-08-13
申请号:US15401221
申请日:2017-01-09
发明人: Martin Scaiano , Andrew Baker , Stephen Korte
摘要: System and method to predict risk of re-identification of a cohort if the cohort is anonymized using a de-identification strategy. An input anonymity histogram and de-identification strategy is used to predict the anonymity histogram that would result from applying the de-identification strategy to the dataset. System embodiments compute a risk of re-identification from the predicted anonymity histogram.
-
公开(公告)号:US10318763B2
公开(公告)日:2019-06-11
申请号:US15385710
申请日:2016-12-20
发明人: Sean Rose , Weilong Song , Martin Scaiano
摘要: System and method to produce an anonymized cohort having less than a predetermined risk of re-identification. The method includes receiving a data query of requested traits for the anonymized cohort, querying a data source to find records that possess at least some of the traits, forming a dataset from at least some of the records, and grouping the dataset in time into a first boundary group, a second boundary group, and one or more non-boundary groups temporally between the first boundary group and second boundary group. For each non-boundary group, calculating maximum time limits the non-boundary group can be time-shifted without overlapping an adjacent group, calculating a group jitter amount, capping the group jitter amount by the maximum time limits and by respective predetermined jitter limits, and jittering said non-boundary group by the capped group jitter amount to produce an anonymized dataset. Return the anonymized dataset.
-
8.
公开(公告)号:US20230307104A1
公开(公告)日:2023-09-28
申请号:US18324453
申请日:2023-05-26
发明人: Stephen Korte , Luk Arbuckle , Andrew Baker , Khaled El Emam , Sean Rose
IPC分类号: G16H10/60
CPC分类号: G16H10/60
摘要: Methods and systems to de-identify a longitudinal dataset of personal records based on journalistic risk computed from a sample set of the personal records, including determining a similarity distribution of the sample set based on quasi-identifiers of the respective personal records, converting the similarity distribution of the sample set to an equivalence class distribution, and computing journalistic risk based on the equivalence distribution. In an embodiment, multiple similarity measures are determined for a personal record based on comparisons with multiple combinations of other personal records of the sample set, and an average of the multiple similarity measures is rounded. In an embodiment, similarity measures are determined for a subset of the sample set and, for each similarity measure, the number of records having the similarity measure is projected to the subset of personal records. Journalistic risk may be computed for multiple types of attacks.
-
公开(公告)号:US20220253559A1
公开(公告)日:2022-08-11
申请号:US17730592
申请日:2022-04-27
发明人: Sean Rose , Weilong Song , Martin Scaiano
IPC分类号: G06F21/62
摘要: System and method to produce an anonymized cohort having less than a predetermined risk of re-identification. The method includes receiving a data query of requested traits for the anonymized cohort, querying a data source to find records that possess at least some of the traits, forming a dataset from at least some of the records, and grouping the dataset in time into a first boundary group, a second boundary group, and one or more non-boundary groups temporally between the first boundary group and second boundary group. For each non-boundary group, calculating maximum time limits the non-boundary group can be time-shifted without overlapping an adjacent group, calculating a group jitter amount, capping the group jitter amount by the maximum time limits and by respective predetermined jitter limits, and jittering said non-boundary group by the capped group jitter amount to produce an anonymized dataset. Return the anonymized dataset.
-
公开(公告)号:US11334685B2
公开(公告)日:2022-05-17
申请号:US16802045
申请日:2020-02-26
发明人: Sean Rose , Weilong Song , Martin Scaiano
摘要: System and method to produce an anonymized cohort having less than a predetermined risk of re-identification. The method includes receiving a data query of requested traits for the anonymized cohort, querying a data source to find records that possess at least some of the traits, forming a dataset from at least some of the records, and grouping the dataset in time into a first boundary group, a second boundary group, and one or more non-boundary groups temporally between the first boundary group and second boundary group. For each non-boundary group, calculating maximum time limits the non-boundary group can be time-shifted without overlapping an adjacent group, calculating a group jitter amount, capping the group jitter amount by the maximum time limits and by respective predetermined jitter limits, and jittering said non-boundary group by the capped group jitter amount to produce an anonymized dataset. Return the anonymized dataset.
-
-
-
-
-
-
-
-
-