Memory-efficient streaming count estimation for multisets

    公开(公告)号:US11314730B1

    公开(公告)日:2022-04-26

    申请号:US16828188

    申请日:2020-03-24

    Abstract: Techniques for memory-efficient streaming count estimation for multisets are described. A method for memory-efficient streaming count estimation for multisets may include obtaining data from a plurality of data sources, and estimating a count for one or more attributes of the data using a telescoping count-min sketch (CMS) data structure, the telescoping CMS including at least a first table and a second table, wherein count values for the data are stored in a plurality of cells of the first table and when a cell of the first table is saturated, the count values for that cell are stored in a corresponding cell of the second table determined based at least on the cell of the first table.

    Row level security in natural language question answering

    公开(公告)号:US12223080B1

    公开(公告)日:2025-02-11

    申请号:US18070086

    申请日:2022-11-28

    Abstract: This disclosure describes a natural language question (NLQ) query service within a service provider network that provides row level security (RLS) for autocomplete during entry of NLQs and fuzzy matching in NLQ answering. The rules take the form of per-user predicates such as Tim can only see rows with region=US. In configurations a complex extraction and preprocessing pipeline to extract distinct combinations of values against RLS predicate “rule keys” is used. Those distinct values are indexed along with grouped rule keys to enable pushing down predicates at auto-complete time. This enables pushing part of RLS rule handling to ingestion time of a dataset rather than handling all RLS rule handling at query time, enabling meeting of latency goals. In configurations, a single logical document of unique cell values is split into multiple documents with a subset of rule keys to handle scalability limits.

    TRANSLITERATION OF DATA RECORDS FOR IMPROVED DATA MATCHING

    公开(公告)号:US20200159857A1

    公开(公告)日:2020-05-21

    申请号:US16197222

    申请日:2018-11-20

    Abstract: A data records service is configured to receive original data records and, in parallel, generate a transliterated version of the original data record into a phonetic based language. Individual fields of data records can be transliterated by identifying a primary language, generating language specific tokens for individual text portions, and transliterating the token. The records processing service can then execute matching models on both original data records and transliterated data records to detect matching data records.

Patent Agency Ranking