-
公开(公告)号:US11314730B1
公开(公告)日:2022-04-26
申请号:US16828188
申请日:2020-03-24
Applicant: Amazon Technologies, Inc.
Inventor: Andrew Borthwick , Stephen Michael Ash
Abstract: Techniques for memory-efficient streaming count estimation for multisets are described. A method for memory-efficient streaming count estimation for multisets may include obtaining data from a plurality of data sources, and estimating a count for one or more attributes of the data using a telescoping count-min sketch (CMS) data structure, the telescoping CMS including at least a first table and a second table, wherein count values for the data are stored in a plurality of cells of the first table and when a cell of the first table is saturated, the count values for that cell are stored in a corresponding cell of the second table determined based at least on the cell of the first table.
-
公开(公告)号:US12223080B1
公开(公告)日:2025-02-11
申请号:US18070086
申请日:2022-11-28
Applicant: Amazon Technologies, Inc.
Inventor: Amjad Al-Rikabi , Stephen Michael Ash , William Michael Siler , Rajkumar Haridoss , Rajesh Patel , Kushal Yelamali
IPC: G06F16/00 , G06F16/242 , G06F16/2455 , G06F21/62 , G06F16/13
Abstract: This disclosure describes a natural language question (NLQ) query service within a service provider network that provides row level security (RLS) for autocomplete during entry of NLQs and fuzzy matching in NLQ answering. The rules take the form of per-user predicates such as Tim can only see rows with region=US. In configurations a complex extraction and preprocessing pipeline to extract distinct combinations of values against RLS predicate “rule keys” is used. Those distinct values are indexed along with grouped rule keys to enable pushing down predicates at auto-complete time. This enables pushing part of RLS rule handling to ingestion time of a dataset rather than handling all RLS rule handling at query time, enabling meeting of latency goals. In configurations, a single logical document of unique cell values is split into multiple documents with a subset of rule keys to handle scalability limits.
-
公开(公告)号:US11726997B2
公开(公告)日:2023-08-15
申请号:US18055384
申请日:2022-11-14
Applicant: Amazon Technologies, Inc.
Inventor: Jun Wang , Zhiguo Wang , Sharanabasappa Parashuram Revadigar , Ramesh M Nallapati , Bing Xiang , Stephen Michael Ash , Timothy Jones , Sudipta Sengupta , Rishav Chakravarti , Patrick Ng , Jiarong Jiang , Hanbo Li , Donald Harold Rivers Weidner
IPC: G06F7/00 , G06F16/2452 , G06F40/295 , G06N20/00 , G06F16/242
CPC classification number: G06F16/24522 , G06F16/243 , G06F40/295 , G06N20/00
Abstract: Multiple stage filtering may be implemented for natural language query processing pipelines. Natural language queries may be received at a natural language query processing system and processed through a query language processing pipeline. The query language processing pipeline may filter candidate linkages for a natural language query before performing further filtering of the candidate linkages in the natural language query processing pipeline as part of generating an intermediate representation used to execute the natural language query.
-
公开(公告)号:US20230078177A1
公开(公告)日:2023-03-16
申请号:US18055384
申请日:2022-11-14
Applicant: Amazon Technologies, Inc.
Inventor: Jun Wang , Zhiguo Wang , Sharanabasappa Parashuram Revadigar , Ramesh M Nallapati , Bing Xiang , Stephen Michael Ash , Timothy Jones , Sudipta Sengupta , Rishav Chakravarti , Patrick Ng , Jiarong Jiang , Hanbo Li , Donald Harold Rivers Weidner
IPC: G06F16/2452 , G06F40/295 , G06N20/00 , G06F16/242
Abstract: Multiple stage filtering may be implemented for natural language query processing pipelines. Natural language queries may be received at a natural language query processing system and processed through a query language processing pipeline. The query language processing pipeline may filter candidate linkages for a natural language query before performing further filtering of the candidate linkages in the natural language query processing pipeline as part of generating an intermediate representation used to execute the natural language query.
-
公开(公告)号:US11604794B1
公开(公告)日:2023-03-14
申请号:US17219689
申请日:2021-03-31
Applicant: Amazon Technologies, Inc.
Inventor: Ramesh M Nallapati , Zhiguo Wang , Bing Xiang , Patrick Ng , Yung Haw Wang , Mukul Karnik , Nanyan Li , Sharanabasappa Parashuram Revadigar , Timothy Jones , Stephen Michael Ash , Sudipta Sengupta , Gregory David Adams , Deepak Shantha Murthy , Douglas Scott Cerny , Stephanie Weeks , Hanbo Li
IPC: G06F16/245 , G06F16/2452 , G06F16/242 , G06F40/295 , G06N20/00
Abstract: Interactive assistances for executing natural language queries to data sets may be performed. A natural language query may be received. Candidate entity linkages may be determined between an entity recognized in the natural language query and columns in data sets. The candidate linkages may be ranked according to confidence scores which may be evaluated to detect ambiguity for an entity linkage. Candidate entity linkages may be provided to a user via an interface to select an entity linkage to use as part of completing the natural language query.
-
公开(公告)号:US11514054B1
公开(公告)日:2022-11-29
申请号:US16145104
申请日:2018-09-27
Applicant: Amazon Technologies, Inc.
Inventor: Andrew Borthwick , Robert Anthony Barton, Jr. , Stephen Michael Ash , Russell Reas
IPC: G06F16/2455 , G06N20/00 , G06F16/28 , G06F16/22 , G06F16/901
Abstract: Supervised partitioning is used to perform record matching. A request to identify matches between records is received. A graph representation that indicates similarities between the records is partitioned and an evaluation of the partitioning is performed according to a supervised machine learning technique to generate a confidence value in the partitioning. An indication of equivalent records according to the partitioning and the confidence value of the partitioning may be provided.
-
公开(公告)号:US20200159857A1
公开(公告)日:2020-05-21
申请号:US16197222
申请日:2018-11-20
Applicant: Amazon Technologies, Inc.
Inventor: Stephen Michael Ash
Abstract: A data records service is configured to receive original data records and, in parallel, generate a transliterated version of the original data record into a phonetic based language. Individual fields of data records can be transliterated by identifying a primary language, generating language specific tokens for individual text portions, and transliterating the token. The records processing service can then execute matching models on both original data records and transliterated data records to detect matching data records.
-
-
-
-
-
-