OPTIMIZING AGGREGATES OVER CORRELATED WINDOWS IN A STREAM-QUERY SYSTEM

    公开(公告)号:US20210382895A1

    公开(公告)日:2021-12-09

    申请号:US16886420

    申请日:2020-05-28

    IPC分类号: G06F16/2453 G06F16/2455

    摘要: Methods, systems, apparatuses, and computer program products are provided for determining a query plan. A query is received that comprises a request for a data result for each of a plurality of original time windows. The plurality of original time windows included in the query are identified. An initial window representation is generated that identifies a set of connections between windows in a window set that includes at least the original time windows. A revised window representation is generated that includes an alternative set of connections between windows in the window set based at least on an execution cost for at least one window. The revised window representation is selected to obtain the data result for each of the plurality of original time windows. A revised query plan based on the revised window representation is provided to obtain the data result for each of the plurality of original time windows.

    GENERATION AND TRAVERSAL OF A HIERARCHICAL INDEX STRUCTURE FOR EFFICIENT DATA RETRIEVAL

    公开(公告)号:US20210334242A1

    公开(公告)日:2021-10-28

    申请号:US16856600

    申请日:2020-04-23

    摘要: Methods, systems, apparatuses, and computer program products are described herein for the generation and traversal of a hierarchical index structure. The structure indexes search keys from data ingested from different data sources and enables efficient retrieval of the keys. As data is ingested, index nodes are generated at the lowest level of the structure. The nodes are analyzed to determine whether such nodes comprise duplicate keys. Responsive to doing so, a new index node is generated located at a higher level of the structure. This process results in a DAG comprising orphan nodes including different search keys. When processing a query for search keys, the orphan index nodes are initially analyzed for the keys. Upon finding a search key, its child nodes are recursively searched until location information specifying the location of ingested data in which the search key is located is found.

    Optimizing input streams for a stream join model having a threshold function

    公开(公告)号:US11188534B2

    公开(公告)日:2021-11-30

    申请号:US16742907

    申请日:2020-01-14

    摘要: This application describes a data stream processing system for receiving and processing multiple data streams using a stream join model and based on the stream join model having a threshold function for which an output of the stream join model crosses the threshold two or less times (e.g., where the threshold function is a convex function, linear function, monotonic function, or other function having a similar property). The data stream processing system may generate filtered data streams using a number of techniques and algorithms without risk of false negatives and mis2169sing instances where an output of a stream join exceeds or violates a threshold condition. The data stream processing system can significantly reduce processing expense, particularly in cases where one or more devices have limited memory and where caching tuples from incoming data streams consumes significant processing resources.

    Execution plan stitching
    6.
    发明授权

    公开(公告)号:US10810202B2

    公开(公告)日:2020-10-20

    申请号:US16008905

    申请日:2018-06-14

    IPC分类号: G06F16/00 G06F16/2453

    摘要: Systems, methods, and computer-executable instructions for creating a query execution plan for a query of a database includes receiving, from the database, a set of previously executed query execution plans for the query. Each previously-executed query execution plans includes subplans. Each subplan indicates a tree of physical operators. Physical operators that executed in the set of previously-executed query execution plans are determined. For each physical operator, an execution cost based is determined. Invalid physical operators from the previously-executed query execution plans that are invalid for the database are removed. Equivalent subplans from the previously-executed query execution plans are identified based on physical properties and logical expressions of the subplans. A constrained search space is created based on the equivalent subplans. A query execution plan for the query is constructed from the constrained search space based on the execution cost. The constructed query execution plan is not within the previously-executed query execution plans.

    Agent-based data pre-processing and data indexing for efficient data retrieval

    公开(公告)号:US11531663B2

    公开(公告)日:2022-12-20

    申请号:US16859733

    申请日:2020-04-27

    摘要: Methods, systems, apparatuses, and computer program products are directed to the generation of a global index structure. Agents executing on different data sources locally pre-process (e.g., format, filter, compress, encode, serialize etc.) data generated thereby and index such data. The agents also manage the resources thereof to perform the pre-processing and indexing operations. Each index generated by an agent is formatted as a plurality of index nodes. The index nodes and pre-processed data are provided to backend server(s) that maintain the global index structure and store the data in a globally distributed file system, which aid in unexpected disaster recovery. The backend server(s) generate the global index structure based on the index nodes. As new index nodes are received by the backend servers, the backend servers merge the newly-received index nodes with the global index structure. Global index structure traversal techniques for retrieving search keys are also described herein.

    DATA LAKE WORKLOAD OPTIMIZATION THROUGH EXPLAINING AND OPTIMIZING INDEX RECOMMENDATIONS

    公开(公告)号:US20210382897A1

    公开(公告)日:2021-12-09

    申请号:US16885878

    申请日:2020-05-28

    摘要: Methods, systems and computer program products are described herein that enable data workload optimization through “what-if” modeling of indexes and index recommendation. In an example aspect, a system is configured to accept a workload comprising a plurality of queries directed at data having a first physical data layout, generate a set of candidate indexes based on the plurality of queries, enumerate index configurations based of the set of candidate indexes, each index configuration comprising a subset on the set of candidate indexes, generate a hierarchical graph of the index configurations, search the hierarchical graph for a recommended index configuration comprising an index configuration with the lowest estimated cost while pruning index configurations not considered from the graph of index configurations to generate a pruned graph, execute a graph query against the pruned graph generating a graph query result and perform an optimization operation based on the graph query result.

    Generation and traversal of a hierarchical index structure for efficient data retrieval

    公开(公告)号:US11567906B2

    公开(公告)日:2023-01-31

    申请号:US16856600

    申请日:2020-04-23

    摘要: Methods, systems, apparatuses, and computer program products are described herein for the generation and traversal of a hierarchical index structure. The structure indexes search keys from data ingested from different data sources and enables efficient retrieval of the keys. As data is ingested, index nodes are generated at the lowest level of the structure. The nodes are analyzed to determine whether such nodes comprise duplicate keys. Responsive to doing so, a new index node is generated located at a higher level of the structure. This process results in a DAG comprising orphan nodes including different search keys. When processing a query for search keys, the orphan index nodes are initially analyzed for the keys. Upon finding a search key, its child nodes are recursively searched until location information specifying the location of ingested data in which the search key is located is found.