Near-real-time data processing with partition files
摘要:
Embodiments disclosed herein are related to implementing a near-real-time stream processing system using the same distributed file system as a batch processing system. A data container and partition files are generated according to a partition window that specifies a time range that controls when data is to be included in the partition files. The data container is scanned to determine if the partition files are within a partition lifetime window that specifies a time range that controls how long the partition files are active for processing. For each partition file within the lifetime window, processing tasks are created based on an amount of data included in the partition files. The data in the partition files is accessed and the processing tasks are performed. Information about the partition files is recorded in a configuration data store.
公开/授权文献
信息查询
0/0