Transliteration of data records for improved data matching

    公开(公告)号:US11120064B2

    公开(公告)日:2021-09-14

    申请号:US16197222

    申请日:2018-11-20

    Abstract: A data records service is configured to receive original data records and, in parallel, generate a transliterated version of the original data record into a phonetic based language. Individual fields of data records can be transliterated by identifying a primary language, generating language specific tokens for individual text portions, and transliterating the token. The records processing service can then execute matching models on both original data records and transliterated data records to detect matching data records.

    Scaling record linkage via elimination of highly overlapped blocks

    公开(公告)号:US11113254B1

    公开(公告)日:2021-09-07

    申请号:US16587902

    申请日:2019-09-30

    Abstract: Techniques for scaling record linkage via elimination of highly overlapped blocks are described. A method for scaling record linkage via elimination of highly overlapped blocks includes identifying a first plurality of blocks based at least on a plurality of records stored in a storage service of a provider network, identifying a plurality of sets of matching blocks from the first plurality of blocks, deleting the plurality of sets of matching blocks except for a first block from each set from the plurality of sets of matching blocks, and iteratively performing dynamic blocking based at least on the first block to generate subsequent pluralities of blocks until the subsequent pluralities of blocks are below a threshold size.

    Scalable parallel elimination of approximately subsumed sets

    公开(公告)号:US11086940B1

    公开(公告)日:2021-08-10

    申请号:US16588296

    申请日:2019-09-30

    Abstract: Techniques for Scalable parallel elimination of approximately subsumed sets are described. A method for Scalable parallel elimination of approximately subsumed sets includes identifying a first plurality of blocks based at least on a plurality of records stored in a storage service of a provider network, determining a plurality of subsumption relationships between blocks from the first plurality of blocks, retaining a first subset of the first plurality of blocks and demoting a second subset of the first plurality of blocks based at least on the plurality of subsumption relationships, and iteratively performing dynamic blocking based at least on the first subset of the plurality of matching blocks and the second subset of the plurality of matching blocks to generate a subsequent pluralities of blocks.

Patent Agency Ranking