-
公开(公告)号:US12007992B2
公开(公告)日:2024-06-11
申请号:US17818878
申请日:2022-08-10
发明人: Rahul Potharaju , Wentao Wu , Terry Y. Kim , Lev Novik , Apoorve Dave
IPC分类号: G06F16/2453 , G06F16/22 , G06F16/25
CPC分类号: G06F16/24542 , G06F16/2272 , G06F16/254
摘要: Methods, systems and computer program products are described herein that provide a serverless, multi-engine, multi-user data lake indexing subsystem and application programming interface. Indexes are defined as derived datasets and stored on the data lake in a universal format that enables disparate engines to create and/or discover indexes for workload optimization. Embodiment of indexes enable stateful control and management of an index via metadata included in the index and stored on the data lake.
-
公开(公告)号:US20210382895A1
公开(公告)日:2021-12-09
申请号:US16886420
申请日:2020-05-28
IPC分类号: G06F16/2453 , G06F16/2455
摘要: Methods, systems, apparatuses, and computer program products are provided for determining a query plan. A query is received that comprises a request for a data result for each of a plurality of original time windows. The plurality of original time windows included in the query are identified. An initial window representation is generated that identifies a set of connections between windows in a window set that includes at least the original time windows. A revised window representation is generated that includes an alternative set of connections between windows in the window set based at least on an execution cost for at least one window. The revised window representation is selected to obtain the data result for each of the plurality of original time windows. A revised query plan based on the revised window representation is provided to obtain the data result for each of the plurality of original time windows.
-
公开(公告)号:US20210334242A1
公开(公告)日:2021-10-28
申请号:US16856600
申请日:2020-04-23
发明人: Rahul Potharaju , Terry Y. Kim , Wentao Wu
IPC分类号: G06F16/185 , G06F16/182 , G06F16/13 , G06F16/14 , G06F16/17
摘要: Methods, systems, apparatuses, and computer program products are described herein for the generation and traversal of a hierarchical index structure. The structure indexes search keys from data ingested from different data sources and enables efficient retrieval of the keys. As data is ingested, index nodes are generated at the lowest level of the structure. The nodes are analyzed to determine whether such nodes comprise duplicate keys. Responsive to doing so, a new index node is generated located at a higher level of the structure. This process results in a DAG comprising orphan nodes including different search keys. When processing a query for search keys, the orphan index nodes are initially analyzed for the keys. Upon finding a search key, its child nodes are recursively searched until location information specifying the location of ingested data in which the search key is located is found.
-
公开(公告)号:US12105713B2
公开(公告)日:2024-10-01
申请号:US17740660
申请日:2022-05-10
发明人: Tarique Ashraf Siddiqui , Saehan Jo , Wentao Wu , Chi Wang , Vivek Ravindranath Narasayya , Surajit Chaudhuri
IPC分类号: G06F16/00 , G06F11/34 , G06F16/21 , G06F16/22 , G06F16/2453 , G06F16/248
CPC分类号: G06F16/24549 , G06F11/3409 , G06F16/21 , G06F16/221 , G06F16/24539 , G06F16/248
摘要: The present disclosure relates to methods and systems for compressing workloads for use with index tuning. The methods and systems receive a workload with a plurality of queries. The methods and systems represent each query using query features and a utility. The methods and systems select a query for a query subset based on a benefit of the query determined using the query features and the utility. The methods and systems update the features and the utility of the remaining queries in the workload and select another query to add to the query subset based on an updated benefit determined using the updated features and utilities. The methods and systems select queries for the query subset equal to a received query subset size. The methods and systems use the query subset in index tuning to provide one or more indexes to recommendations.
-
公开(公告)号:US11188534B2
公开(公告)日:2021-11-30
申请号:US16742907
申请日:2020-01-14
IPC分类号: G06F16/24 , G06F16/2453 , G06F16/2455 , G16Y10/75
摘要: This application describes a data stream processing system for receiving and processing multiple data streams using a stream join model and based on the stream join model having a threshold function for which an output of the stream join model crosses the threshold two or less times (e.g., where the threshold function is a convex function, linear function, monotonic function, or other function having a similar property). The data stream processing system may generate filtered data streams using a number of techniques and algorithms without risk of false negatives and mis2169sing instances where an output of a stream join exceeds or violates a threshold condition. The data stream processing system can significantly reduce processing expense, particularly in cases where one or more devices have limited memory and where caching tuples from incoming data streams consumes significant processing resources.
-
公开(公告)号:US10810202B2
公开(公告)日:2020-10-20
申请号:US16008905
申请日:2018-06-14
发明人: Bailu Ding , Sudipto Das , Wentao Wu , Surajit Chaudhuri , Vivek R Narasayya
IPC分类号: G06F16/00 , G06F16/2453
摘要: Systems, methods, and computer-executable instructions for creating a query execution plan for a query of a database includes receiving, from the database, a set of previously executed query execution plans for the query. Each previously-executed query execution plans includes subplans. Each subplan indicates a tree of physical operators. Physical operators that executed in the set of previously-executed query execution plans are determined. For each physical operator, an execution cost based is determined. Invalid physical operators from the previously-executed query execution plans that are invalid for the database are removed. Equivalent subplans from the previously-executed query execution plans are identified based on physical properties and logical expressions of the subplans. A constrained search space is created based on the equivalent subplans. A query execution plan for the query is constructed from the constrained search space based on the execution cost. The constructed query execution plan is not within the previously-executed query execution plans.
-
公开(公告)号:US11531663B2
公开(公告)日:2022-12-20
申请号:US16859733
申请日:2020-04-27
发明人: Rahul Potharaju , Terry Y. Kim , Wentao Wu
IPC分类号: G06F7/00 , G06F16/00 , G06F16/22 , G06F16/248 , G06F16/245
摘要: Methods, systems, apparatuses, and computer program products are directed to the generation of a global index structure. Agents executing on different data sources locally pre-process (e.g., format, filter, compress, encode, serialize etc.) data generated thereby and index such data. The agents also manage the resources thereof to perform the pre-processing and indexing operations. Each index generated by an agent is formatted as a plurality of index nodes. The index nodes and pre-processed data are provided to backend server(s) that maintain the global index structure and store the data in a globally distributed file system, which aid in unexpected disaster recovery. The backend server(s) generate the global index structure based on the index nodes. As new index nodes are received by the backend servers, the backend servers merge the newly-received index nodes with the global index structure. Global index structure traversal techniques for retrieving search keys are also described herein.
-
公开(公告)号:US20210382897A1
公开(公告)日:2021-12-09
申请号:US16885878
申请日:2020-05-28
发明人: Rahul Potharaju , Wentao Wu
IPC分类号: G06F16/2453 , G06F16/2458 , G06F16/901
摘要: Methods, systems and computer program products are described herein that enable data workload optimization through “what-if” modeling of indexes and index recommendation. In an example aspect, a system is configured to accept a workload comprising a plurality of queries directed at data having a first physical data layout, generate a set of candidate indexes based on the plurality of queries, enumerate index configurations based of the set of candidate indexes, each index configuration comprising a subset on the set of candidate indexes, generate a hierarchical graph of the index configurations, search the hierarchical graph for a recommended index configuration comprising an index configuration with the lowest estimated cost while pruning index configurations not considered from the graph of index configurations to generate a pruned graph, execute a graph query against the pruned graph generating a graph query result and perform an optimization operation based on the graph query result.
-
9.
公开(公告)号:US12066993B2
公开(公告)日:2024-08-20
申请号:US17832274
申请日:2022-06-03
发明人: Wentao Wu , Chi Wang , Tarique Ashraf Siddiqui , Vivek Ravindranath Narasayya , Surajit Chaudhuri
IPC分类号: G06F16/21 , G06F16/22 , G06F16/2453
CPC分类号: G06F16/217 , G06F16/2246 , G06F16/2453
摘要: The present disclosure relates to systems, methods, and computer-readable media for determining optimal index configurations for processing workloads in a database management system. For instance, an index configuration system can efficiently determine a subset of indexes for processing a workload utilizing one or more reinforcement learning models. For example, in various implementations, the index configuration system utilizes a Markov decision process and/or a Monte Carlo tree search model to determine an optimal subset of indexes for processing a workload in a manner that effectively utilizes computing device resources while also avoiding significant interference with customer workloads.
-
公开(公告)号:US11567906B2
公开(公告)日:2023-01-31
申请号:US16856600
申请日:2020-04-23
发明人: Rahul Potharaju , Terry Y. Kim , Wentao Wu
IPC分类号: G06F16/00 , G06F16/185 , G06F16/182 , G06F16/17 , G06F16/14 , G06F16/13
摘要: Methods, systems, apparatuses, and computer program products are described herein for the generation and traversal of a hierarchical index structure. The structure indexes search keys from data ingested from different data sources and enables efficient retrieval of the keys. As data is ingested, index nodes are generated at the lowest level of the structure. The nodes are analyzed to determine whether such nodes comprise duplicate keys. Responsive to doing so, a new index node is generated located at a higher level of the structure. This process results in a DAG comprising orphan nodes including different search keys. When processing a query for search keys, the orphan index nodes are initially analyzed for the keys. Upon finding a search key, its child nodes are recursively searched until location information specifying the location of ingested data in which the search key is located is found.
-
-
-
-
-
-
-
-
-