MAINTAINING THROUGHPUT OF A STREAM PROCESSING FRAMEWORK WHILE INCREASING PROCESSING LOAD

    公开(公告)号:US20180253335A1

    公开(公告)日:2018-09-06

    申请号:US15973230

    申请日:2018-05-07

    CPC classification number: G06F9/505 G06F3/0613 G06F3/0631 G06F3/067 G06F9/5061

    Abstract: The technology disclosed relates to maintaining throughput of a stream processing framework while increasing processing load. In particular, it relates to defining a container over at least one worker node that has a plurality workers, with one worker utilizing a whole core within a worker node, and queuing data from one or more incoming near real-time (NRT) data streams in multiple pipelines that run in the container and have connections to at least one common resource external to the container. It further relates to concurrently executing the pipelines at a number of workers as batches, and limiting simultaneous connections to the common resource to the number of workers by providing a shared connection to a set of batches running on a same worker regardless of the pipelines to which the batches in the set belong.

    Recovery strategy for a stream processing system

    公开(公告)号:US09946593B2

    公开(公告)日:2018-04-17

    申请号:US15004887

    申请日:2016-01-22

    Abstract: The technology disclosed relates to discovering multiple previously unknown and undetected technical problems in fault tolerance and data recovery mechanisms of modern stream processing systems. In addition, it relates to providing technical solutions to these previously unknown and undetected problems. In particular, the technology disclosed relates to discovering the problem of modification of batch size of a given batch during its replay after a processing failure. This problem results in over-count when the input during replay is not a superset of the input fed at the original play. Further, the technology disclosed discovers the problem of inaccurate counter updates in replay schemes of modern stream processing systems when one or more keys disappear between a batch's first play and its replay. This problem is exacerbated when data in batches is merged or mapped with data from an external data store.

    MANAGING PROCESSING OF LONG TAIL TASK SEQUENCES IN A STREAM PROCESSING FRAMEWORK

    公开(公告)号:US20170083378A1

    公开(公告)日:2017-03-23

    申请号:US14986419

    申请日:2015-12-31

    CPC classification number: G06F9/5038 G06F9/5072 G06F9/5088 G06F17/30516

    Abstract: The technology disclosed relates to managing processing of long tail task sequences in a stream processing framework. In particular, it relates to operating a computing grid that includes a plurality of physical threads which processes data from one or more near real-time (NRT) data streams for multiple task sequences, and queuing data from the NRT data streams as batches in multiple pipelines using a grid-coordinator that controls dispatch of the batches to the physical threads. The method also includes assigning a priority-level to each of the pipelines using a grid-scheduler, wherein the grid-scheduler initiates execution of a first number of batches from a first pipeline before execution of a second number of batches from a second pipeline, responsive to respective priority levels of the first and second pipelines.

    MAINTAINING THROUGHPUT OF A STREAM PROCESSING FRAMEWORK WHILE INCREASING PROCESSING LOAD

    公开(公告)号:US20170083368A1

    公开(公告)日:2017-03-23

    申请号:US14986401

    申请日:2015-12-31

    CPC classification number: G06F9/505 G06F3/0613 G06F3/0631 G06F3/067 G06F9/5061

    Abstract: The technology disclosed relates to maintaining throughput of a stream processing framework while increasing processing load. In particular, it relates to defining a container over at least one worker node that has a plurality workers, with one worker utilizing a whole core within a worker node, and queuing data from one or more incoming near real-time (NRT) data streams in multiple pipelines that run in the container and have connections to at least one common resource external to the container. It further relates to concurrently executing the pipelines at a number of workers as batches, and limiting simultaneous connections to the common resource to the number of workers by providing a shared connection to a set of batches running on a same worker regardless of the pipelines to which the batches in the set belong.

Patent Agency Ranking