ADDRESSING MEMORY LIMITS FOR PARTITION TRACKING AMONG WORKER NODES

    公开(公告)号:US20240320231A1

    公开(公告)日:2024-09-26

    申请号:US18626007

    申请日:2024-04-03

    Applicant: Splunk Inc.

    CPC classification number: G06F16/2471 G06F16/278

    Abstract: Systems and methods are described for distributed processing a query in a first query language utilizing a query execution engine intended for single-device execution. While distributed processing provides numerous benefits over single-device processing, distributed query execution engines can be significantly more difficult to develop that single-device engines. Embodiments of this disclosure enable the use of a single-device engine to support distributed processing, by dividing a query into multiple stages, each of which can be executed by multiple, concurrent executions of a single-device engine. Between stages, data can be shuffled between executions of the engine, such that individual executions of the engine are provided with a complete set of records needed to implement an individual stage. Because single-device engines can be significantly less difficult to develop, use of the techniques described herein can enable a distributed system to rapidly support multiple query languages.

    GENERATING A SUBQUERY FOR AN EXTERNAL DATA SYSTEM USING A CONFIGURATION FILE

    公开(公告)号:US20230214386A1

    公开(公告)日:2023-07-06

    申请号:US18181900

    申请日:2023-03-10

    Applicant: Splunk Inc.

    CPC classification number: G06F16/24535 G06F16/2425 G06F16/258 G06F16/22

    Abstract: Systems and methods are disclosed for receiving, at a data intake and query system, a query that includes an indication to process data managed by a third-party data storage and processing system that supports a different query language than the data intake and query system. The data intake and query system identifies a third-party data storage and processing system that manages the data to be processed and generates a subquery for execution by the third-party data storage and processing system, generates instructions for one or more worker nodes to receive and process results of the subquery from the third-party data storage and processing system, and instructs the worker nodes to provide results of the processing to the data intake and query system.

    Bucket data distribution for exporting data to worker nodes

    公开(公告)号:US11580107B2

    公开(公告)日:2023-02-14

    申请号:US16398038

    申请日:2019-04-29

    Applicant: Splunk Inc.

    Abstract: Systems and methods are described for exporting bucket data from one or more buckets to one or more worker nodes. The system can identify data from different bucket data from buckets stored in a data intake and query system that is to be processed by one or more worker nodes. The system can allocate one or more execution resources, such as a processing pipeline, to process and export the bucket data from the buckets. The system can assign bucket data corresponding to individual buckets to the execution resource based on a bucket distribution policy. The indexer can export the bucket data to the worker nodes for further processing based on the bucket data-execution resource assignment.

    Determining a record generation estimate of a processing task

    公开(公告)号:US11442935B2

    公开(公告)日:2022-09-13

    申请号:US16397930

    申请日:2019-04-29

    Applicant: Splunk Inc.

    Abstract: Systems and methods are described for determining a record generation estimate related to a particular processing task. The system obtains a sample set of data that includes multiple records. The system applies a processing task, such as a transform or regular expression rule to the sample set of data and determines how many records are generated by the processing task. Based on the number of records generated, the system determines a record generation estimate. The system can use the record generation estimate to allocate compute resources or determine a query execution time for at least a portion of the query based on the record generation estimate.

Patent Agency Ranking