-
公开(公告)号:US10313430B2
公开(公告)日:2019-06-04
申请号:US15281273
申请日:2016-09-30
Inventor: Yao Xu , Cong Wang , Yuncong Zhang , Jianwei Zhang , Xin Huang
Abstract: A distributed method and apparatus for processing streaming data are disclosed. A specific implementation of the method includes: encapsulating received streaming data as a first resilient distributed dataset; performing a grouping operation on the first resilient distributed dataset based on time windows, the grouping operation comprising: assigning each data element in the first resilient distributed dataset into a group corresponding to a time window to which a recorded timestamp of the data element belongs, and forming second resilient distributed datasets comprising a limited number of data elements and respectively corresponding to the time windows; encapsulating the second resilient distributed datasets as a nested dataset comprising a plurality of the second resilient distributed datasets; passing, using a predefined traversal operator, each of the second resilient distributed datasets in the nested dataset successively to a batch operator defined in a finite dataset to perform distributed data processing.
-
公开(公告)号:US11379499B2
公开(公告)日:2022-07-05
申请号:US16293360
申请日:2019-03-05
Inventor: Yuncong Zhang , Xiang Wen , Cong Wang , Hua Chai , Yao Xu
IPC: G06F16/27
Abstract: A method and apparatus for executing a distributed computing task are provided. The method can include: parsing an expression of the distributed computing task to obtain an operator keyword; and executing, by using an operator corresponding to the operator keyword, the distributed computing task based on an input parameter of the operator, the input parameter of the operator including at least one of: a distributed stored distributed dataset including at least one data element, a distributed stored distributed key-value pair set including at least one key-value pair, and a value of the key-value pair in the distributed key-value pair set being a distributed dataset or a distributed key-value pair set.
-
公开(公告)号:US11132363B2
公开(公告)日:2021-09-28
申请号:US16352576
申请日:2019-03-13
Inventor: Jianwei Zhang , Yuncong Zhang , Cong Wang , Yao Xu , Chunyang Wen , Xin Huang , Zhan Song , Guanyin Zhu
IPC: G06F16/24 , G06F16/2453 , G06F9/50 , G06F40/205 , G06F16/242 , G06F16/182 , G06F16/22
Abstract: A distributed computing framework and a distributed computing method are provided. A specific embodiment of the distributed computing framework includes: a parsing unit, configured to parse an expression of a distributed computing task, and determine an operator and a field corresponding to the operator; and an operator unit, configured to provide the operator, input parameters of the operator including: the field and a field-type distributed dataset. The type of parameters received and returned by any operator may be the field-type distributed dataset, and any operator may operate on the data corresponding to the field in the field-type distributed dataset. Therefore, any operator needs to be implemented once to realize the reuse of the operator. The distributed computing task is expressed in a simple expression, which simplifies the complexity of writing a distributed computing program with the distributed computing framework used by the user.
-
公开(公告)号:US20170374137A1
公开(公告)日:2017-12-28
申请号:US15281273
申请日:2016-09-30
Inventor: Yao Xu , Cong Wang , Yuncong Zhang , Jianwei Zhang , Xin Huang
CPC classification number: H04L67/10 , H04L65/601 , H04L65/605
Abstract: The application discloses a distributed method and apparatus for processing streaming data. A specific implementation of the method comprises: encapsulating received streaming data as a first resilient distributed dataset; performing a grouping operation of the first resilient distributed dataset based on time windows, the grouping operation comprising: assigning each data element in the first resilient distributed dataset into a group corresponding to the time window to which a recorded timestamp of the data element belongs, and forming second resilient distributed datasets, each of the second resilient distributed datasets comprising a limited number of data elements and corresponding to the time window; encapsulating the second resilient distributed datasets as a nested dataset comprising a plurality of the second resilient distributed datasets; passing, using a predefined traversal operator, each of the second resilient distributed datasets in the nested dataset successively to a batch operator defined in a finite dataset to perform distributed data processing. This implementation achieves the reuse of the operator in the resilient distributed dataset.
-
-
-