-
公开(公告)号:US20170374137A1
公开(公告)日:2017-12-28
申请号:US15281273
申请日:2016-09-30
Inventor: Yao Xu , Cong Wang , Yuncong Zhang , Jianwei Zhang , Xin Huang
CPC classification number: H04L67/10 , H04L65/601 , H04L65/605
Abstract: The application discloses a distributed method and apparatus for processing streaming data. A specific implementation of the method comprises: encapsulating received streaming data as a first resilient distributed dataset; performing a grouping operation of the first resilient distributed dataset based on time windows, the grouping operation comprising: assigning each data element in the first resilient distributed dataset into a group corresponding to the time window to which a recorded timestamp of the data element belongs, and forming second resilient distributed datasets, each of the second resilient distributed datasets comprising a limited number of data elements and corresponding to the time window; encapsulating the second resilient distributed datasets as a nested dataset comprising a plurality of the second resilient distributed datasets; passing, using a predefined traversal operator, each of the second resilient distributed datasets in the nested dataset successively to a batch operator defined in a finite dataset to perform distributed data processing. This implementation achieves the reuse of the operator in the resilient distributed dataset.
-
公开(公告)号:US10313430B2
公开(公告)日:2019-06-04
申请号:US15281273
申请日:2016-09-30
Inventor: Yao Xu , Cong Wang , Yuncong Zhang , Jianwei Zhang , Xin Huang
Abstract: A distributed method and apparatus for processing streaming data are disclosed. A specific implementation of the method includes: encapsulating received streaming data as a first resilient distributed dataset; performing a grouping operation on the first resilient distributed dataset based on time windows, the grouping operation comprising: assigning each data element in the first resilient distributed dataset into a group corresponding to a time window to which a recorded timestamp of the data element belongs, and forming second resilient distributed datasets comprising a limited number of data elements and respectively corresponding to the time windows; encapsulating the second resilient distributed datasets as a nested dataset comprising a plurality of the second resilient distributed datasets; passing, using a predefined traversal operator, each of the second resilient distributed datasets in the nested dataset successively to a batch operator defined in a finite dataset to perform distributed data processing.
-
公开(公告)号:US11132363B2
公开(公告)日:2021-09-28
申请号:US16352576
申请日:2019-03-13
Inventor: Jianwei Zhang , Yuncong Zhang , Cong Wang , Yao Xu , Chunyang Wen , Xin Huang , Zhan Song , Guanyin Zhu
IPC: G06F16/24 , G06F16/2453 , G06F9/50 , G06F40/205 , G06F16/242 , G06F16/182 , G06F16/22
Abstract: A distributed computing framework and a distributed computing method are provided. A specific embodiment of the distributed computing framework includes: a parsing unit, configured to parse an expression of a distributed computing task, and determine an operator and a field corresponding to the operator; and an operator unit, configured to provide the operator, input parameters of the operator including: the field and a field-type distributed dataset. The type of parameters received and returned by any operator may be the field-type distributed dataset, and any operator may operate on the data corresponding to the field in the field-type distributed dataset. Therefore, any operator needs to be implemented once to realize the reuse of the operator. The distributed computing task is expressed in a simple expression, which simplifies the complexity of writing a distributed computing program with the distributed computing framework used by the user.
-
-