Distributed data processing in multi-tenant environments

    公开(公告)号:US10853358B2

    公开(公告)日:2020-12-01

    申请号:US16024264

    申请日:2018-06-29

    Abstract: Methods, systems, and devices for data processing within a distributed data system are described. In a multi-tenant distributed data system, a provider may supply executable code for processing data using declarative processing instructions received from a tenant. For example, a tenant may provide tenant-specific processing instructions for a requested set of data. The processing instructions may indicate input information (e.g., a data structure, tenant-specific fields, etc.), transformation information (e.g., from a set of pre-defined transformations), and output information. The provider-supplied code may use the tenant-specific processing instructions to process and generate the requested set of data, where the code may be executed by multiple nodes within the system. As such, the code executed by multiple nodes may utilize the input information, transformation information, and output information from the tenant-specific processing instructions to generate the requested data and provide the data to the tenant.

    Interactively building previews of extract, transform, load (ETL) graphs using cached previews of subgraphs

    公开(公告)号:US11841872B2

    公开(公告)日:2023-12-12

    申请号:US17385393

    申请日:2021-07-26

    CPC classification number: G06F16/254 G06F3/0486 G06F16/24552 G06F16/26

    Abstract: Disclosed are some implementations of systems, apparatus, methods and computer program products for executing a process flow represented by a graph or portion thereof using cached subgraphs. A first request to execute a first portion of a process flow is processed, where the first portion of the process flow is represented by a first subgraph of a graph representing the process flow and a final node of the first subgraph corresponds to a set of computer-readable instructions. The first portion of the process flow is executed such that a first output of executing the first portion of the process flow is obtained. The first subgraph is stored in association with the first output in a first cache entry of a cache. A second request to execute a second portion of the process flow is processed, where the second portion of the process flow is represented by a second subgraph of the graph. At least one cache entry for which a corresponding subgraph matches at least a portion of the second subgraph is identified in the cache, where the at least one cache entry includes the first cache entry. The first output is retrieved from the first cache entry, a node of the second subgraph to which the final node of the first subgraph is connected is identified, and the second portion of the process flow is executed by providing the first output as input to the identified node of the second subgraph without executing the set of computer-readable instructions.

    DISTRIBUTED DATA PROCESSING IN MULTI-TENANT ENVIRONMENTS

    公开(公告)号:US20200004858A1

    公开(公告)日:2020-01-02

    申请号:US16024264

    申请日:2018-06-29

    Abstract: Methods, systems, and devices for data processing within a distributed data system are described. In a multi-tenant distributed data system, a provider may supply executable code for processing data using declarative processing instructions received from a tenant. For example, a tenant may provide tenant-specific processing instructions for a requested set of data. The processing instructions may indicate input information (e.g., a data structure, tenant-specific fields, etc.), transformation information (e.g., from a set of pre-defined transformations), and output information. The provider-supplied code may use the tenant-specific processing instructions to process and generate the requested set of data, where the code may be executed by multiple nodes within the system. As such, the code executed by multiple nodes may utilize the input information, transformation information, and output information from the tenant-specific processing instructions to generate the requested data and provide the data to the tenant.

    Interactive dataflow preview
    6.
    发明授权

    公开(公告)号:US11755608B2

    公开(公告)日:2023-09-12

    申请号:US16740918

    申请日:2020-01-13

    CPC classification number: G06F16/254 G06F16/258

    Abstract: DESCRIBED HEREIN ARE SYSTEMS, APPARATUS, METHODS AND COMPUTER PROGRAM PRODUCTS FOR IMPLEMENTING DESIGN TIME AND BATCH TIME FOR AN EXTRACT, TRANSFORM, LOAD (ETL) PROCESS. WHEN A SESSION IS ESTABLISHED, A KUBERNETES POD INSTANCE MAY BE EXCLUSIVELY ASSOCIATED WITH A USER FOR THE USER'S SESSIONS. DESIGN TIME AND BATCH TIME MAY BOTH BE PERFORMED WITHIN THE KUBERNETES POD INSTANCE. AS SUCH, A SERVICE PROVIDER MAY PROVIDE A SECURE DATAFLOW PREVIEW. FURTHERMORE, THE DATAFLOW MAY BE CACHED. CACHING OF THE DATAFLOW ALLOWS THE SERVICE PROVIDER TO MORE QUICKLY PROVIDE FOLLOW UP PREVIEWS, DECREASING LATENCY.

    ORCHESTRATION FOR DATA PIPELINE EXECUTION PLANS

    公开(公告)号:US20210240519A1

    公开(公告)日:2021-08-05

    申请号:US16779040

    申请日:2020-01-31

    Abstract: Methods, systems, and devices supporting dynamic process orchestration are described. An orchestration server may receive a request defining a data modification process from a user device. The orchestration server may generate an execution file based on the request, and the execution file may include a set of tasks for performing the data modification process and an order for performing the set of tasks. The orchestration server may execute, for the execution file, a first set of tasks according to the order for performing the set of tasks and, in some cases, may update the execution file based on executing the first subset of tasks. For example, updating the execution file may involve modifying a second subset of tasks of the set of tasks. The orchestration server may execute, for the updated execution file, the modified second subset of tasks according to the order for performing the set of tasks.

Patent Agency Ranking