Memory-efficient streaming count estimation for multisets

    公开(公告)号:US11314730B1

    公开(公告)日:2022-04-26

    申请号:US16828188

    申请日:2020-03-24

    Abstract: Techniques for memory-efficient streaming count estimation for multisets are described. A method for memory-efficient streaming count estimation for multisets may include obtaining data from a plurality of data sources, and estimating a count for one or more attributes of the data using a telescoping count-min sketch (CMS) data structure, the telescoping CMS including at least a first table and a second table, wherein count values for the data are stored in a plurality of cells of the first table and when a cell of the first table is saturated, the count values for that cell are stored in a corresponding cell of the second table determined based at least on the cell of the first table.

    Scaling record linkage via elimination of highly overlapped blocks

    公开(公告)号:US11113254B1

    公开(公告)日:2021-09-07

    申请号:US16587902

    申请日:2019-09-30

    Abstract: Techniques for scaling record linkage via elimination of highly overlapped blocks are described. A method for scaling record linkage via elimination of highly overlapped blocks includes identifying a first plurality of blocks based at least on a plurality of records stored in a storage service of a provider network, identifying a plurality of sets of matching blocks from the first plurality of blocks, deleting the plurality of sets of matching blocks except for a first block from each set from the plurality of sets of matching blocks, and iteratively performing dynamic blocking based at least on the first block to generate subsequent pluralities of blocks until the subsequent pluralities of blocks are below a threshold size.

    Scalable parallel elimination of approximately subsumed sets

    公开(公告)号:US11086940B1

    公开(公告)日:2021-08-10

    申请号:US16588296

    申请日:2019-09-30

    Abstract: Techniques for Scalable parallel elimination of approximately subsumed sets are described. A method for Scalable parallel elimination of approximately subsumed sets includes identifying a first plurality of blocks based at least on a plurality of records stored in a storage service of a provider network, determining a plurality of subsumption relationships between blocks from the first plurality of blocks, retaining a first subset of the first plurality of blocks and demoting a second subset of the first plurality of blocks based at least on the plurality of subsumption relationships, and iteratively performing dynamic blocking based at least on the first subset of the plurality of matching blocks and the second subset of the plurality of matching blocks to generate a subsequent pluralities of blocks.

    Intersection-based dynamic blocking

    公开(公告)号:US10599614B1

    公开(公告)日:2020-03-24

    申请号:US15860393

    申请日:2018-01-02

    Abstract: Block size reduction iterations are performed on a plurality of blocks of records until a block size criterion is met. An iteration comprises identifying, from a first collection of blocks, using one or more pivot operations, a set of combinations of oversized blocks such that at least one record belongs to all blocks of a combination. A new block comprising records that are members of each block of a first combination of the set is included in a second collection of blocks to be examined in a subsequent iteration. On at least one block created in an iteration, analysis operations are performed.

Patent Agency Ranking