-
21.
公开(公告)号:US20180253335A1
公开(公告)日:2018-09-06
申请号:US15973230
申请日:2018-05-07
Applicant: Salesforce.com, Inc
Inventor: Elden Gregory Bishop , Jeffrey Chao
CPC classification number: G06F9/505 , G06F3/0613 , G06F3/0631 , G06F3/067 , G06F9/5061
Abstract: The technology disclosed relates to maintaining throughput of a stream processing framework while increasing processing load. In particular, it relates to defining a container over at least one worker node that has a plurality workers, with one worker utilizing a whole core within a worker node, and queuing data from one or more incoming near real-time (NRT) data streams in multiple pipelines that run in the container and have connections to at least one common resource external to the container. It further relates to concurrently executing the pipelines at a number of workers as batches, and limiting simultaneous connections to the common resource to the number of workers by providing a shared connection to a set of batches running on a same worker regardless of the pipelines to which the batches in the set belong.
-
公开(公告)号:US09946593B2
公开(公告)日:2018-04-17
申请号:US15004887
申请日:2016-01-22
Applicant: salesforce.com, inc.
Inventor: Elden Gregory Bishop , Jeffrey Chao
CPC classification number: G06F11/1471 , G06F11/14 , G06F11/1438 , G06F11/202 , G06F11/2035 , G06F11/2048 , G06F2201/84
Abstract: The technology disclosed relates to discovering multiple previously unknown and undetected technical problems in fault tolerance and data recovery mechanisms of modern stream processing systems. In addition, it relates to providing technical solutions to these previously unknown and undetected problems. In particular, the technology disclosed relates to discovering the problem of modification of batch size of a given batch during its replay after a processing failure. This problem results in over-count when the input during replay is not a superset of the input fed at the original play. Further, the technology disclosed discovers the problem of inaccurate counter updates in replay schemes of modern stream processing systems when one or more keys disappear between a batch's first play and its replay. This problem is exacerbated when data in batches is merged or mapped with data from an external data store.
-
公开(公告)号:US20170083378A1
公开(公告)日:2017-03-23
申请号:US14986419
申请日:2015-12-31
Applicant: salesforce.com, inc.
Inventor: Elden Gregory Bishop , Jeffrey Chao
CPC classification number: G06F9/5038 , G06F9/5072 , G06F9/5088 , G06F17/30516
Abstract: The technology disclosed relates to managing processing of long tail task sequences in a stream processing framework. In particular, it relates to operating a computing grid that includes a plurality of physical threads which processes data from one or more near real-time (NRT) data streams for multiple task sequences, and queuing data from the NRT data streams as batches in multiple pipelines using a grid-coordinator that controls dispatch of the batches to the physical threads. The method also includes assigning a priority-level to each of the pipelines using a grid-scheduler, wherein the grid-scheduler initiates execution of a first number of batches from a first pipeline before execution of a second number of batches from a second pipeline, responsive to respective priority levels of the first and second pipelines.
-
24.
公开(公告)号:US20170083368A1
公开(公告)日:2017-03-23
申请号:US14986401
申请日:2015-12-31
Applicant: salesforce.com, inc.
Inventor: Elden Gregory Bishop , Jeffrey Chao
CPC classification number: G06F9/505 , G06F3/0613 , G06F3/0631 , G06F3/067 , G06F9/5061
Abstract: The technology disclosed relates to maintaining throughput of a stream processing framework while increasing processing load. In particular, it relates to defining a container over at least one worker node that has a plurality workers, with one worker utilizing a whole core within a worker node, and queuing data from one or more incoming near real-time (NRT) data streams in multiple pipelines that run in the container and have connections to at least one common resource external to the container. It further relates to concurrently executing the pipelines at a number of workers as batches, and limiting simultaneous connections to the common resource to the number of workers by providing a shared connection to a set of batches running on a same worker regardless of the pipelines to which the batches in the set belong.
-
-
-