-
公开(公告)号:US11531666B1
公开(公告)日:2022-12-20
申请号:US16998922
申请日:2020-08-20
Applicant: Amazon Technologies, Inc.
Inventor: Yangbae Park , Laxmi Siva Prasad Balaramaraju Jalumari , Daniel Opincariu , Fletcher Liverance , Zhuonan Song
Abstract: Methods, systems, and computer-readable media for indexing partitions using distributed Bloom filters are disclosed. A data indexing system generates a plurality of indices for a plurality of partitions in a distributed object store. The indices comprise a plurality of Bloom filters. An individual one of the Bloom filters corresponds to one or more fields of an individual one of the partitions. Using the Bloom filters, the data indexing system determines a first portion of the partitions that possibly comprise a value and a second portion of the partitions that do not comprise the value. Based (at least in part) on a scan of the first portion of the partitions and not the second portion of the partitions, the data indexing system determines one or more partitions of the first portion of the partitions that comprise the value.
-
公开(公告)号:US11816081B1
公开(公告)日:2023-11-14
申请号:US17205885
申请日:2021-03-18
Applicant: Amazon Technologies, Inc.
Inventor: Daniel Opincariu , Zhuonan Song
IPC: G06F16/22 , G06F16/27 , G06F16/2458 , G06F16/2453
CPC classification number: G06F16/2228 , G06F16/2462 , G06F16/24532 , G06F16/278
Abstract: Systems, devices, and methods are provided for efficient query execution on distributed data sets, such as in the context of data lakes. In at least one embodiment, indexing information is used to identify candidate and non-candidate portions of a data set. Non-candidate portions may be irrelevant to the query. Indexing information can be encoded using Bloom filters.
-