Automatic Performance Tuning for Distributed Data Stream Processing Systems

被引:7
|
作者
Herodotou, Herodotos [1 ]
Odysseos, Lambros [1 ]
Chen, Yuxing [2 ]
Lu, Jiaheng [3 ]
机构
[1] Cyprus Univ Technol, Limassol, Cyprus
[2] Tencent Inc, Shenzhen, Peoples R China
[3] Univ Helsinki, Helsinki, Finland
关键词
Performance tuning; data stream processing; parameter tuning; Storm; Flink; Spark Streaming;
D O I
10.1109/ICDE53745.2022.00296
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Distributed data stream processing systems (DSPSs) such as Storm, Flink, and Spark Streaming are now routinely used to process continuous data streams in (near) real-time. However, achieving the low latency and high throughput demanded by today's streaming applications can be a daunting task, especially since the performance of DSPSs highly depends on a large number of system parameters that control load balancing, degree of parallelism, buffer sizes, and various other aspects of system execution. This tutorial offers a comprehensive review of the state-of-the-art automatic performance tuning approaches that have been proposed in recent years. The approaches are organized into five main categories based on their methodologies and features: cost modeling, simulation-based, experiment-driven, machine learning, and adaptive tuning. The categories of approaches will be analyzed in depth and compared to each other, exposing their various strengths and weaknesses. Finally, we will identify several open research problems and challenges related to automatic performance tuning for DSPSs.
引用
收藏
页码:3194 / 3197
页数:4
相关论文
共 50 条
  • [1] Towards Automatic Parameter Tuning of Stream Processing Systems
    Bilal, Muhammad
    Canini, Marco
    PROCEEDINGS OF THE 2017 SYMPOSIUM ON CLOUD COMPUTING (SOCC '17), 2017, : 189 - 200
  • [2] ContTune: Continuous Tuning by Conservative Bayesian Optimization for Distributed Stream Data Processing Systems
    Lian, Jinqing
    Zhang, Xinyi
    Shao, Yingxia
    Pu, Zenglin
    Xiang, Qingfeng
    Li, Yawen
    Cui, Bin
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2023, 16 (13): : 4282 - 4295
  • [3] Reliable stream data processing for elastic distributed stream processing systems
    Xiaohui Wei
    Yuan Zhuang
    Hongliang Li
    Zhiliang Liu
    Cluster Computing, 2020, 23 : 555 - 574
  • [4] A Performance Benchmark for NetFlow Data Analysis on Distributed Stream Processing Systems
    Cermak, Milan
    Tovarnak, Daniel
    Lastovicka, Martin
    Celeda, Pavel
    NOMS 2016 - 2016 IEEE/IFIP NETWORK OPERATIONS AND MANAGEMENT SYMPOSIUM, 2016, : 919 - 924
  • [5] Reliable stream data processing for elastic distributed stream processing systems
    Wei, Xiaohui
    Zhuang, Yuan
    Li, Hongliang
    Liu, Zhiliang
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2020, 23 (02): : 555 - 574
  • [6] Automatic workflow scheduling tuning for distributed processing systems
    Visheratin, Alexander A.
    Melnik, Mikhail
    Nasonov, Denis
    5TH INTERNATIONAL YOUNG SCIENTIST CONFERENCE ON COMPUTATIONAL SCIENCE, YSC 2016, 2016, 101 : 388 - 397
  • [7] Tracing Distributed Data Stream Processing Systems
    Zvara, Zoltan
    Szabo, Peter G. N.
    Hermann, Gabor
    Benczur, Andras
    2017 IEEE 2ND INTERNATIONAL WORKSHOPS ON FOUNDATIONS AND APPLICATIONS OF SELF* SYSTEMS (FAS*W), 2017, : 235 - 242
  • [8] Benchmarking Distributed Stream Data Processing Systems
    Karimov, Jeyhun
    Rabl, Tilmann
    Katsifodimos, Asterios
    Samarev, Roman
    Heiskanen, Henri
    Markl, Volker
    2018 IEEE 34TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2018, : 1507 - 1518
  • [9] Resource Estimation in Distributed Data Stream Processing Systems
    Fan, Minglu
    Liang, Yi
    Liu, Fei
    Yang, Mangmang
    Wang, Haihua
    PROCEEDINGS OF THE 2016 2ND WORKSHOP ON ADVANCED RESEARCH AND TECHNOLOGY IN INDUSTRY APPLICATIONS, 2016, 81 : 1824 - 1827
  • [10] Data-Trace Types for Distributed Stream Processing Systems
    Mamouras, Konstantinos
    Stanford, Caleb
    Alur, Rajeev
    Ives, Zachary G.
    Tannen, Val
    PROCEEDINGS OF THE 40TH ACM SIGPLAN CONFERENCE ON PROGRAMMING LANGUAGE DESIGN AND IMPLEMENTATION (PLDI '19), 2019, : 670 - 685