Systems and Methods for Anonymizing Large Scale Datasets

    公开(公告)号:US20250077709A1

    公开(公告)日:2025-03-06

    申请号:US18955530

    申请日:2024-11-21

    Applicant: Google LLC

    Abstract: A computer-implemented method for k-anonymizing a dataset to provide privacy guarantees for all columns in the dataset can include obtaining, by a computing system including one or more computing devices, a dataset comprising data indicative of a plurality of entities and at least one data item respective to at least one of the plurality of entities. The computer-implemented method can include clustering, by the computing system, the plurality of entities into at least one entity cluster. The computer-implemented method can include determining, by the computing system, a majority condition for the at least one entity cluster, the majority condition indicating that the at least one data item is respective to at least a majority of the plurality of entities. The computer-implemented method can include assigning, by the computing system, the at least one data item to the plurality of entities in an anonymized dataset based at least in part on the majority condition.

    Systems and methods for anonymizing large scale datasets

    公开(公告)号:US12164673B2

    公开(公告)日:2024-12-10

    申请号:US18345657

    申请日:2023-06-30

    Applicant: Google LLC

    Abstract: A computer-implemented method for k-anonymizing a dataset to provide privacy guarantees for all columns in the dataset can include obtaining, by a computing system including one or more computing devices, a dataset comprising data indicative of a plurality of entities and at least one data item respective to at least one of the plurality of entities. The computer-implemented method can include clustering, by the computing system, the plurality of entities into at least one entity cluster. The computer-implemented method can include determining, by the computing system, a majority condition for the at least one entity cluster, the majority condition indicating that the at least one data item is respective to at least a majority of the plurality of entities. The computer-implemented method can include assigning, by the computing system, the at least one data item to the plurality of entities in an anonymized dataset based at least in part on the majority condition.

    Generating computationally-efficient representations of large datasets

    公开(公告)号:US11238357B2

    公开(公告)日:2022-02-01

    申请号:US16042975

    申请日:2018-07-23

    Applicant: Google LLC

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing large datasets using a computationally-efficient representation are disclosed. A request to apply a coverage algorithm to a large input dataset is received. The large dataset includes sets of elements. A computationally-efficient representation of the large dataset is generated by generating a reduced set of elements that contains fewer elements based on a defined probability. For each element in the reduced set, a determination is made regarding whether the element appears in more than a threshold number of sets. When the element appears in more than the threshold number, the element is removed from sets until the element appears in only the threshold number. The coverage algorithm is then applied to the computationally-efficient representation to identify a subset of the sets. The system provides data identifying the subset of the sets in response to the received request.

    Systems and methods for anonymizing large scale datasets

    公开(公告)号:US11727147B2

    公开(公告)日:2023-08-15

    申请号:US17016788

    申请日:2020-09-10

    Applicant: Google LLC

    CPC classification number: G06F21/6254 G06F16/285 G06N20/00

    Abstract: A computer-implemented method for k-anonymizing a dataset to provide privacy guarantees for all columns in the dataset can include obtaining, by a computing system including one or more computing devices, a dataset comprising data indicative of a plurality of entities and at least one data item respective to at least one of the plurality of entities. The computer-implemented method can include clustering, by the computing system, the plurality of entities into at least one entity cluster. The computer-implemented method can include determining, by the computing system, a majority condition for the at least one entity cluster, the majority condition indicating that the at least one data item is respective to at least a majority of the plurality of entities. The computer-implemented method can include assigning, by the computing system, the at least one data item to the plurality of entities in an anonymized dataset based at least in part on the majority condition.

    Efficient on-device public-private computation

    公开(公告)号:US11574067B2

    公开(公告)日:2023-02-07

    申请号:US16774380

    申请日:2020-01-28

    Applicant: Google LLC

    Abstract: Example systems and methods enhance user privacy by performing efficient on-device public-private computation on a combination of public and private data, such as, for example, public and private graph data. In particular, the on-device public-private computation framework described herein can enable a device associated with an entity to efficiently compute a combined output that takes into account and is explicitly based upon a combination of data that is associated with the entity and data that is associated with one or more other entities that are private connections of the entity, all without revealing to a centralized computing system a set of locally stored private data that identifies the one or more other entities that are private connections of the entity.

    Systems and Methods for Anonymizing Large Scale Datasets

    公开(公告)号:US20220075897A1

    公开(公告)日:2022-03-10

    申请号:US17016788

    申请日:2020-09-10

    Applicant: Google LLC

    Abstract: A computer-implemented method for k-anonymizing a dataset to provide privacy guarantees for all columns in the dataset can include obtaining, by a computing system including one or more computing devices, a dataset comprising data indicative of a plurality of entities and at least one data item respective to at least one of the plurality of entities. The computer-implemented method can include clustering, by the computing system, the plurality of entities into at least one entity cluster. The computer-implemented method can include determining, by the computing system, a majority condition for the at least one entity cluster, the majority condition indicating that the at least one data item is respective to at least a majority of the plurality of entities. The computer-implemented method can include assigning, by the computing system, the at least one data item to the plurality of entities in an anonymized dataset based at least in part on the majority condition.

    Efficient On-Device Public-Private Computation

    公开(公告)号:US20200242268A1

    公开(公告)日:2020-07-30

    申请号:US16774380

    申请日:2020-01-28

    Applicant: Google LLC

    Abstract: Example systems and methods enhance user privacy by performing efficient on-device public-private computation on a combination of public and private data, such as, for example, public and private graph data. In particular, the on-device public-private computation framework described herein can enable a device associated with an entity to efficiently compute a combined output that takes into account and is explicitly based upon a combination of data that is associated with the entity and data that is associated with one or more other entities that are private connections of the entity, all without revealing to a centralized computing system a set of locally stored private data that identifies the one or more other entities that are private connections of the entity.

    Systems and Methods for Anonymizing Large Scale Datasets

    公开(公告)号:US20230359769A1

    公开(公告)日:2023-11-09

    申请号:US18345657

    申请日:2023-06-30

    Applicant: Google LLC

    CPC classification number: G06F21/6254 G06N20/00 G06F16/285

    Abstract: A computer-implemented method for k-anonymizing a dataset to provide privacy guarantees for all columns in the dataset can include obtaining, by a computing system including one or more computing devices, a dataset comprising data indicative of a plurality of entities and at least one data item respective to at least one of the plurality of entities. The computer-implemented method can include clustering, by the computing system, the plurality of entities into at least one entity cluster. The computer-implemented method can include determining, by the computing system, a majority condition for the at least one entity cluster, the majority condition indicating that the at least one data item is respective to at least a majority of the plurality of entities. The computer-implemented method can include assigning, by the computing system, the at least one data item to the plurality of entities in an anonymized dataset based at least in part on the majority condition.

    GENERATING COMPUTATIONALLY-EFFICIENT REPRESENTATIONS OF LARGE DATASETS

    公开(公告)号:US20190026640A1

    公开(公告)日:2019-01-24

    申请号:US16042975

    申请日:2018-07-23

    Applicant: Google LLC

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing large datasets using a computationally-efficient representation are disclosed. A request to apply a coverage algorithm to a large input dataset is received. The large dataset includes sets of elements. A computationally-efficient representation of the large dataset is generated by generating a reduced set of elements that contains fewer elements based on a defined probability. For each element in the reduced set, a determination is made regarding whether the element appears in more than a threshold number of sets. When the element appears in more than the threshold number, the element is removed from sets until the element appears in only the threshold number. The coverage algorithm is then applied to the computationally-efficient representation to identify a subset of the sets. The system provides data identifying the subset of the sets in response to the received request.

Patent Agency Ranking