Patent search ap:("Amazon Technologies Page Inc.") AND inv:"Daniel Opincariu"

1.

发明授权
Shadow data lakes 有权

公开(公告)号：US11720548B1

公开(公告)日：2023-08-08

申请号：US17205949

申请日：2021-03-18

Applicant: Amazon Technologies, Inc.

Inventor： Daniel Opincariu , Yangbae Park , Sanjay Mathew Thomas

IPC: G06F16/23 , G06F16/245 , G06F21/62

CPC classification number: G06F16/2379 , G06F16/245 , G06F21/6245

Abstract: Systems, devices, and methods are provided for implementing shadow data lakes. In at least one embodiment, a deletion workflow obtains a deletion request from a delete request cache service, gets attestation details from an attestation service, submits a job to scan one or more records from a source table of a data lake and publish the one or more records to a deleted records table of a shadow data lake, and cause deletion of the one or more records from the data lake.

2.

发明授权
Indexing partitions using distributed bloom filters 有权

公开(公告)号：US11531666B1

公开(公告)日：2022-12-20

申请号：US16998922

申请日：2020-08-20

Applicant: Amazon Technologies, Inc.

Inventor： Yangbae Park , Laxmi Siva Prasad Balaramaraju Jalumari , Daniel Opincariu , Fletcher Liverance , Zhuonan Song

IPC: G06F16/20 , G06F16/22 , G06N5/04 , G06N20/00 , G06F16/23

Abstract: Methods, systems, and computer-readable media for indexing partitions using distributed Bloom filters are disclosed. A data indexing system generates a plurality of indices for a plurality of partitions in a distributed object store. The indices comprise a plurality of Bloom filters. An individual one of the Bloom filters corresponds to one or more fields of an individual one of the partitions. Using the Bloom filters, the data indexing system determines a first portion of the partitions that possibly comprise a value and a second portion of the partitions that do not comprise the value. Based (at least in part) on a scan of the first portion of the partitions and not the second portion of the partitions, the data indexing system determines one or more partitions of the first portion of the partitions that comprise the value.

3.

发明授权
Efficient query optimization on distributed data sets 有权

公开(公告)号：US11816081B1

公开(公告)日：2023-11-14

申请号：US17205885

申请日：2021-03-18

Applicant: Amazon Technologies, Inc.

Inventor： Daniel Opincariu , Zhuonan Song

IPC: G06F16/22 , G06F16/27 , G06F16/2458 , G06F16/2453

CPC classification number: G06F16/2228 , G06F16/2462 , G06F16/24532 , G06F16/278

Abstract: Systems, devices, and methods are provided for efficient query execution on distributed data sets, such as in the context of data lakes. In at least one embodiment, indexing information is used to identify candidate and non-candidate portions of a data set. Non-candidate portions may be irrelevant to the query. Indexing information can be encoded using Bloom filters.

4.

发明授权
Federated execution of data lake processes 有权

公开(公告)号：US12277134B1

公开(公告)日：2025-04-15

申请号：US18478274

申请日：2023-09-29

Applicant: Amazon Technologies, Inc.

Inventor： Daniel Opincariu , Rajasuba Subramanian , Arnab Dutta , Deepan Chakravarthy Vijayarangam , Ranil Pavithran Muzhangathu , Anas Fattahi

IPC: G06F16/20 , G06F16/25

Abstract: In a data lake, a control data object is defined. The control object defines the processes and relationships of processes associated with a data set in the data lake. The control has states that are tied to and adapt in response to state changes of the associated data set. A control can have a control type. The system automatically carries forward enabled processes from one data set version to the next data set version. The system uses the control definition to execute processes, such as compaction or data quality scans, on data sets in the data lake.

5.

发明授权
Data retention management for partitioned datasets 有权

公开(公告)号：US12072868B1

公开(公告)日：2024-08-27

申请号：US17224987

申请日：2021-04-07

Applicant: Amazon Technologies, Inc.

Inventor： Daniel Opincariu , Sandeep Joshi

IPC: G06F16/23 , G06F16/22

CPC classification number: G06F16/2379 , G06F16/2228

Abstract: Systems and methods are disclosed to implement a data storage system that manages data retention for partitioned datasets. A received data retention policy specifies to selectively delete data from a dataset based on a set of data retention attributes. If the data retention attributes are part of the dataset's partition key, a first type of data deletion job is configured to selectively delete entire partitions of the dataset. Otherwise, the system will generate a retention attribute index for the dataset, which will be used by a second type of data deletion job to selectively delete individual records within the partitions. In embodiments, the retention attribute index is implemented as Bloom filters that track retention attribute values in each partition. Advantageously, the disclosed system is able to automatically configure deletion jobs for any dataset schema that avoids full scans of the dataset partitions.

Patent Agency Ranking