-
公开(公告)号:US11416283B2
公开(公告)日:2022-08-16
申请号:US16503145
申请日:2019-07-03
Inventor: Weikang Gao , Yanlin Wang , Yue Xing , Jianwei Zhang , Yi Cheng
IPC: G06F9/48 , G06F9/50 , G06F9/54 , G06F9/4401 , G06F9/455
Abstract: A method and apparatus for processing stream data are provided. The method may include: acquiring a to-be-adjusted number of target execution units, the target execution unit referring to a unit executing a target program segment in a stream computing system; adjusting a number of the target execution units in the stream computing system based on the to-be-adjusted number; determining, for a target execution unit in at least one target execution unit after the adjustment, an identifier set corresponding to the target execution unit, an identifier in the identifier set being used to indicate to-be-processed data; and processing, through the target execution unit, the to-be-processed data indicated by the identifier in the corresponding identifier set.
-
公开(公告)号:US20180205776A1
公开(公告)日:2018-07-19
申请号:US15873744
申请日:2018-01-17
Inventor: Ran Shi , Yi Cheng , Jianwei Zhang , Weikang Gao
CPC classification number: H04L65/60 , G06F11/0709 , G06F11/0793 , G06F11/1407 , G06F11/1438 , G06F11/1471 , G06F2201/84 , H04L67/1097 , H04L69/40
Abstract: The objective of the present invention is to provide a method, apparatus, computing node and computer program product for fault handling in a stream computing system. Here, at a computing node, recording arrival sequences of respective original data from a upstream computing node; performing persistence operation on the respective original data according to a predetermined period; in the case of failure and restart, restoring to-be-computed data in internal storage from the original data subjected to the persistent operation and/or the upstream computing node, and replaying and computing the restored to-be-computed data according to the respective previous arrival sequences; continuing encoding each completely computed result data according to offset of the result data in the last persistent operation period before the failure and transmitting the encoded result data to a next node.
-
公开(公告)号:US20170374137A1
公开(公告)日:2017-12-28
申请号:US15281273
申请日:2016-09-30
Inventor: Yao Xu , Cong Wang , Yuncong Zhang , Jianwei Zhang , Xin Huang
CPC classification number: H04L67/10 , H04L65/601 , H04L65/605
Abstract: The application discloses a distributed method and apparatus for processing streaming data. A specific implementation of the method comprises: encapsulating received streaming data as a first resilient distributed dataset; performing a grouping operation of the first resilient distributed dataset based on time windows, the grouping operation comprising: assigning each data element in the first resilient distributed dataset into a group corresponding to the time window to which a recorded timestamp of the data element belongs, and forming second resilient distributed datasets, each of the second resilient distributed datasets comprising a limited number of data elements and corresponding to the time window; encapsulating the second resilient distributed datasets as a nested dataset comprising a plurality of the second resilient distributed datasets; passing, using a predefined traversal operator, each of the second resilient distributed datasets in the nested dataset successively to a batch operator defined in a finite dataset to perform distributed data processing. This implementation achieves the reuse of the operator in the resilient distributed dataset.
-
公开(公告)号:US11368506B2
公开(公告)日:2022-06-21
申请号:US15873744
申请日:2018-01-17
Inventor: Ran Shi , Yi Cheng , Jianwei Zhang , Weikang Gao
Abstract: The objective of the present invention is to provide a method, apparatus, computing node and computer program product for fault handling in a stream computing system. Here, at a computing node, recording arrival sequences of respective original data from a upstream computing node; performing persistence operation on the respective original data according to a predetermined period; in the case of failure and restart, restoring to-be-computed data in internal storage from the original data subjected to the persistent operation and/or the upstream computing node, and replaying and computing the restored to-be-computed data according to the respective previous arrival sequences; continuing encoding each completely computed result data according to offset of the result data in the last persistent operation period before the failure and transmitting the encoded result data to a next node.
-
公开(公告)号:US11132363B2
公开(公告)日:2021-09-28
申请号:US16352576
申请日:2019-03-13
Inventor: Jianwei Zhang , Yuncong Zhang , Cong Wang , Yao Xu , Chunyang Wen , Xin Huang , Zhan Song , Guanyin Zhu
IPC: G06F16/24 , G06F16/2453 , G06F9/50 , G06F40/205 , G06F16/242 , G06F16/182 , G06F16/22
Abstract: A distributed computing framework and a distributed computing method are provided. A specific embodiment of the distributed computing framework includes: a parsing unit, configured to parse an expression of a distributed computing task, and determine an operator and a field corresponding to the operator; and an operator unit, configured to provide the operator, input parameters of the operator including: the field and a field-type distributed dataset. The type of parameters received and returned by any operator may be the field-type distributed dataset, and any operator may operate on the data corresponding to the field in the field-type distributed dataset. Therefore, any operator needs to be implemented once to realize the reuse of the operator. The distributed computing task is expressed in a simple expression, which simplifies the complexity of writing a distributed computing program with the distributed computing framework used by the user.
-
公开(公告)号:US10313430B2
公开(公告)日:2019-06-04
申请号:US15281273
申请日:2016-09-30
Inventor: Yao Xu , Cong Wang , Yuncong Zhang , Jianwei Zhang , Xin Huang
Abstract: A distributed method and apparatus for processing streaming data are disclosed. A specific implementation of the method includes: encapsulating received streaming data as a first resilient distributed dataset; performing a grouping operation on the first resilient distributed dataset based on time windows, the grouping operation comprising: assigning each data element in the first resilient distributed dataset into a group corresponding to a time window to which a recorded timestamp of the data element belongs, and forming second resilient distributed datasets comprising a limited number of data elements and respectively corresponding to the time windows; encapsulating the second resilient distributed datasets as a nested dataset comprising a plurality of the second resilient distributed datasets; passing, using a predefined traversal operator, each of the second resilient distributed datasets in the nested dataset successively to a batch operator defined in a finite dataset to perform distributed data processing.
-
-
-
-
-