DISTRIBUTED METHOD AND APPARATUS FOR PROCESSING STREAMING DATA

    公开(公告)号:US20170374137A1

    公开(公告)日:2017-12-28

    申请号:US15281273

    申请日:2016-09-30

    CPC classification number: H04L67/10 H04L65/601 H04L65/605

    Abstract: The application discloses a distributed method and apparatus for processing streaming data. A specific implementation of the method comprises: encapsulating received streaming data as a first resilient distributed dataset; performing a grouping operation of the first resilient distributed dataset based on time windows, the grouping operation comprising: assigning each data element in the first resilient distributed dataset into a group corresponding to the time window to which a recorded timestamp of the data element belongs, and forming second resilient distributed datasets, each of the second resilient distributed datasets comprising a limited number of data elements and corresponding to the time window; encapsulating the second resilient distributed datasets as a nested dataset comprising a plurality of the second resilient distributed datasets; passing, using a predefined traversal operator, each of the second resilient distributed datasets in the nested dataset successively to a batch operator defined in a finite dataset to perform distributed data processing. This implementation achieves the reuse of the operator in the resilient distributed dataset.

    Fault handling for computer nodes in stream computing system

    公开(公告)号:US11368506B2

    公开(公告)日:2022-06-21

    申请号:US15873744

    申请日:2018-01-17

    Abstract: The objective of the present invention is to provide a method, apparatus, computing node and computer program product for fault handling in a stream computing system. Here, at a computing node, recording arrival sequences of respective original data from a upstream computing node; performing persistence operation on the respective original data according to a predetermined period; in the case of failure and restart, restoring to-be-computed data in internal storage from the original data subjected to the persistent operation and/or the upstream computing node, and replaying and computing the restored to-be-computed data according to the respective previous arrival sequences; continuing encoding each completely computed result data according to offset of the result data in the last persistent operation period before the failure and transmitting the encoded result data to a next node.

    Distributed method and apparatus for processing streaming data

    公开(公告)号:US10313430B2

    公开(公告)日:2019-06-04

    申请号:US15281273

    申请日:2016-09-30

    Abstract: A distributed method and apparatus for processing streaming data are disclosed. A specific implementation of the method includes: encapsulating received streaming data as a first resilient distributed dataset; performing a grouping operation on the first resilient distributed dataset based on time windows, the grouping operation comprising: assigning each data element in the first resilient distributed dataset into a group corresponding to a time window to which a recorded timestamp of the data element belongs, and forming second resilient distributed datasets comprising a limited number of data elements and respectively corresponding to the time windows; encapsulating the second resilient distributed datasets as a nested dataset comprising a plurality of the second resilient distributed datasets; passing, using a predefined traversal operator, each of the second resilient distributed datasets in the nested dataset successively to a batch operator defined in a finite dataset to perform distributed data processing.

Patent Agency Ranking