-
公开(公告)号:US11720548B1
公开(公告)日:2023-08-08
申请号:US17205949
申请日:2021-03-18
Applicant: Amazon Technologies, Inc.
Inventor: Daniel Opincariu , Yangbae Park , Sanjay Mathew Thomas
IPC: G06F16/23 , G06F16/245 , G06F21/62
CPC classification number: G06F16/2379 , G06F16/245 , G06F21/6245
Abstract: Systems, devices, and methods are provided for implementing shadow data lakes. In at least one embodiment, a deletion workflow obtains a deletion request from a delete request cache service, gets attestation details from an attestation service, submits a job to scan one or more records from a source table of a data lake and publish the one or more records to a deleted records table of a shadow data lake, and cause deletion of the one or more records from the data lake.
-
公开(公告)号:US11531666B1
公开(公告)日:2022-12-20
申请号:US16998922
申请日:2020-08-20
Applicant: Amazon Technologies, Inc.
Inventor: Yangbae Park , Laxmi Siva Prasad Balaramaraju Jalumari , Daniel Opincariu , Fletcher Liverance , Zhuonan Song
Abstract: Methods, systems, and computer-readable media for indexing partitions using distributed Bloom filters are disclosed. A data indexing system generates a plurality of indices for a plurality of partitions in a distributed object store. The indices comprise a plurality of Bloom filters. An individual one of the Bloom filters corresponds to one or more fields of an individual one of the partitions. Using the Bloom filters, the data indexing system determines a first portion of the partitions that possibly comprise a value and a second portion of the partitions that do not comprise the value. Based (at least in part) on a scan of the first portion of the partitions and not the second portion of the partitions, the data indexing system determines one or more partitions of the first portion of the partitions that comprise the value.
-
公开(公告)号:US10248508B1
公开(公告)日:2019-04-02
申请号:US14310429
申请日:2014-06-20
Applicant: Amazon Technologies, Inc.
Inventor: Yangbae Park , Jason Scott Flittner , Aaron John Seldon Steers
Abstract: A data validation service may validate data sets maintained for one or more data sources. Several rule sets may describe various rules used to validate one or more data sets. The rule sets may be automatically applied to respective data sets in order to validate the respective data sets according to a dynamically determined schedule for the application of the rule sets. Reporting events may be detected which correspond to a rule set. In response to detecting a reporting event, a responsive action may be performed as described in the rule set, such as providing notification of the reporting event.
-
-