Distributed method and apparatus for processing streaming data

    公开(公告)号:US10313430B2

    公开(公告)日:2019-06-04

    申请号:US15281273

    申请日:2016-09-30

    Abstract: A distributed method and apparatus for processing streaming data are disclosed. A specific implementation of the method includes: encapsulating received streaming data as a first resilient distributed dataset; performing a grouping operation on the first resilient distributed dataset based on time windows, the grouping operation comprising: assigning each data element in the first resilient distributed dataset into a group corresponding to a time window to which a recorded timestamp of the data element belongs, and forming second resilient distributed datasets comprising a limited number of data elements and respectively corresponding to the time windows; encapsulating the second resilient distributed datasets as a nested dataset comprising a plurality of the second resilient distributed datasets; passing, using a predefined traversal operator, each of the second resilient distributed datasets in the nested dataset successively to a batch operator defined in a finite dataset to perform distributed data processing.

    DISTRIBUTED METHOD AND APPARATUS FOR PROCESSING STREAMING DATA

    公开(公告)号:US20170374137A1

    公开(公告)日:2017-12-28

    申请号:US15281273

    申请日:2016-09-30

    CPC classification number: H04L67/10 H04L65/601 H04L65/605

    Abstract: The application discloses a distributed method and apparatus for processing streaming data. A specific implementation of the method comprises: encapsulating received streaming data as a first resilient distributed dataset; performing a grouping operation of the first resilient distributed dataset based on time windows, the grouping operation comprising: assigning each data element in the first resilient distributed dataset into a group corresponding to the time window to which a recorded timestamp of the data element belongs, and forming second resilient distributed datasets, each of the second resilient distributed datasets comprising a limited number of data elements and corresponding to the time window; encapsulating the second resilient distributed datasets as a nested dataset comprising a plurality of the second resilient distributed datasets; passing, using a predefined traversal operator, each of the second resilient distributed datasets in the nested dataset successively to a batch operator defined in a finite dataset to perform distributed data processing. This implementation achieves the reuse of the operator in the resilient distributed dataset.

    Method and apparatus for executing distributed computing task

    公开(公告)号:US11379499B2

    公开(公告)日:2022-07-05

    申请号:US16293360

    申请日:2019-03-05

    Abstract: A method and apparatus for executing a distributed computing task are provided. The method can include: parsing an expression of the distributed computing task to obtain an operator keyword; and executing, by using an operator corresponding to the operator keyword, the distributed computing task based on an input parameter of the operator, the input parameter of the operator including at least one of: a distributed stored distributed dataset including at least one data element, a distributed stored distributed key-value pair set including at least one key-value pair, and a value of the key-value pair in the distributed key-value pair set being a distributed dataset or a distributed key-value pair set.

Patent Agency Ranking