Automatic Performance Tuning for Distributed Data Stream Processing Systems

被引:7
|
作者
Herodotou, Herodotos [1 ]
Odysseos, Lambros [1 ]
Chen, Yuxing [2 ]
Lu, Jiaheng [3 ]
机构
[1] Cyprus Univ Technol, Limassol, Cyprus
[2] Tencent Inc, Shenzhen, Peoples R China
[3] Univ Helsinki, Helsinki, Finland
关键词
Performance tuning; data stream processing; parameter tuning; Storm; Flink; Spark Streaming;
D O I
10.1109/ICDE53745.2022.00296
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Distributed data stream processing systems (DSPSs) such as Storm, Flink, and Spark Streaming are now routinely used to process continuous data streams in (near) real-time. However, achieving the low latency and high throughput demanded by today's streaming applications can be a daunting task, especially since the performance of DSPSs highly depends on a large number of system parameters that control load balancing, degree of parallelism, buffer sizes, and various other aspects of system execution. This tutorial offers a comprehensive review of the state-of-the-art automatic performance tuning approaches that have been proposed in recent years. The approaches are organized into five main categories based on their methodologies and features: cost modeling, simulation-based, experiment-driven, machine learning, and adaptive tuning. The categories of approaches will be analyzed in depth and compared to each other, exposing their various strengths and weaknesses. Finally, we will identify several open research problems and challenges related to automatic performance tuning for DSPSs.
引用
收藏
页码:3194 / 3197
页数:4
相关论文
共 50 条
  • [21] Resource Configuration Tuning for Stream Data Processing Systems via Bayesian Optimization
    Huang, Shixin
    Chen, Chao
    Zhu, Gangya
    Xin, Jinhan
    Wang, Zheng
    Hwang, Kai
    Yu, Zhibin
    Intelligent Computing, 2022, 2022
  • [22] AutoReplica: Automatic Data Replica Manager in Distributed Caching and Data Processing Systems
    Yang, Zhengyu
    Wang, Jiayin
    Evans, David
    Mi, Ningfang
    2016 IEEE 35TH INTERNATIONAL PERFORMANCE COMPUTING AND COMMUNICATIONS CONFERENCE (IPCCC), 2016,
  • [23] Signal processing challenges in distributed stream processing systems
    Frossard, Pascal
    Verscheure, Olivier
    Venkatramani, Chitra
    2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-13, 2006, : 5903 - 5906
  • [24] Modeling Data Stream Intensity in Distributed Stream Processing System
    Gorawski, Marcin
    Marks, Pawel
    Gorawski, Michal
    COMPUTER NETWORKS, CN 2013, 2013, 370 : 372 - 383
  • [25] An Efficient Approach for Storage of Big Data Streams in Distributed Stream Processing Systems
    Alshamrani, Sultan
    Waseem, Quadri
    Alharbi, Abdullah
    Alosaimi, Wael
    Turabieh, Hamza
    Alyami, Hashem
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (05) : 91 - 98
  • [26] Optimizing distributed data stream processing by tracing
    Zvara, Zoltan
    Szabo, Peter G. N.
    Balazs, Barnabas
    Benczur, Andras
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2019, 90 : 578 - 591
  • [27] A Survey of Distributed Data Stream Processing Frameworks
    Isah, Haruna
    Abughofa, Tariq
    Mahfuz, Sazia
    Ajerla, Dharmitha
    Zulkernine, Farhana
    Khan, Shahzad
    IEEE ACCESS, 2019, 7 : 154300 - 154316
  • [28] Performance Analysis of Large-scale Distributed Stream Processing Systems on the Cloud
    Tri Minh Truong
    Harwood, Aaron
    Sinnott, Richard O.
    Chen, Shiping
    PROCEEDINGS 2018 IEEE 11TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (CLOUD), 2018, : 754 - 761
  • [29] Distributed Multilevel Secure Data Stream Processing
    Xie, Xing
    Ray, Indrakshi
    Ranasinghe, Waruna
    Gilbert, Philips A.
    Shashidhara, Pramod
    Yadav, Anoop
    2013 33RD IEEE INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS WORKSHOPS (ICDCSW 2013), 2013, : 368 - 373
  • [30] A Prediction Framework for Distributed Data Stream Processing
    He ZhiYong
    Du RongHua
    PROCEEDINGS OF THE 2009 PACIFIC-ASIA CONFERENCE ON CIRCUITS, COMMUNICATIONS AND SYSTEM, 2009, : 179 - 183