A Performance Optimization Method Based on Dynamic Topology for Stream Computing and Its Implementation in Storm

被引:0
|
作者
Lu J.-W. [1 ]
Wu H. [1 ]
Chen H. [2 ]
Zhang Y.-M. [1 ]
Liang Q.-H. [3 ]
Xiao G. [1 ]
机构
[1] Department of Computer Science and Technology, Zhejiang University of Technology, Hangzhou, 310023, Zhejiang
[2] Team of Big Data Computing and Service, Department of Infrastructure Business, Alibaba, Hangzhou, 310011, Zhejiang
[3] School of Computer Science and Engineering, Nanyang Technological University, Singapore
来源
关键词
Big data; Data stream topology; Performance optimization; Stream computing; Stream computing system;
D O I
10.3969/j.issn.0372-2112.2020.05.007
中图分类号
学科分类号
摘要
Responsiveness and stability have always been two important problems in stream computing. However, as the scale of data being processed in real-time has increased, along with an increase in the data processing latency and topology instability of stream computing, many limitations of stream processing system have become apparent. Aiming at these problems, we present a performance optimization method based on dynamic topology for stream computing:(1) Dynamic step-by-step backpressure: the task in the topology can dynamically adjust the rate of upstream data transmission according to the current load.(2) Stateless topology data replay: topology can achieve data fault tolerance autonomously without maintaining the calculation of data state.(3) Adaptive topology replacement: no need for topology to suspend, the system can adjust the task concurrency spontaneously.(4) Delayed persistent queue: it delays the IO reading and writing in the disk out of the data processing, which mitigates the impact of IO high-frequency blocking in stream computing system. In this paper, the four methods are implemented in Apache Storm. The experimental results show that the optimized system not only enhances the dynamic matching capability of big data, but also achieves 17% higher throughput and 20% better data processing speed in the best case. © 2020, Chinese Institute of Electronics. All right reserved.
引用
收藏
页码:878 / 890
页数:12
相关论文
共 29 条
  • [1] Shieh CK, Huang SW, Sun LD, Et al., A topology-based scaling mechanism for Apache Storm[J], International Journal of Network Management, 27, 3, pp. 63-68, (2016)
  • [2] Dean J, Ghemawat S., MapReduce:A flexible data processing tool[J], Communications of the ACM, 53, 1, pp. 72-77, (2010)
  • [3] 4
  • [4] Akidau T, Balikov A, Bekiro, Et al., MillWheel:fault-tolerant stream processing at internet scale[J], Proceedings of the Vldb Endowment, 6, 11, pp. 1033-1044, (2013)
  • [5] Gulisano V, Jimenez-Peris R, Patino-Martinez M, Et al., StreamCloud:An elastic and scalable data streaming system, IEEE Transactions on Parallel & Distributed Systems, 23, 12, pp. 2351-2365, (2012)
  • [6] Shkapsky A, Yang M, Interlandi M, Et al., Big data analytics with datalog queries on spark, Proceedings of the 2016 International Conference on Management of Data(SIGMOD2016), pp. 1135-1149, (2016)
  • [7] Nair LR, Shetty SD, Shetty SD., Applying spark based machine learning model on streaming big data for health status prediction[J], Computers & Electrical Engineering, 65, 1, pp. 393-399, (2018)
  • [8] Barber R, Garcia-Arellano C, Grosman R, Et al., Evolving databases for new-gen big data applications, The Eighth Biennial Conference on Innovative Data Systems Research(CIDR 2017), (2017)
  • [9] 8
  • [10] Heinze T, Aniello L, Querzoni L, Et al., Tutorial:Cloud-based data stream processing, Proceedings of the 8th ACM International Conference on Distributed Event-Based Systems, pp. 238-245, (2014)