Automatic Performance Tuning for Distributed Data Stream Processing Systems

被引:7
|
作者
Herodotou, Herodotos [1 ]
Odysseos, Lambros [1 ]
Chen, Yuxing [2 ]
Lu, Jiaheng [3 ]
机构
[1] Cyprus Univ Technol, Limassol, Cyprus
[2] Tencent Inc, Shenzhen, Peoples R China
[3] Univ Helsinki, Helsinki, Finland
关键词
Performance tuning; data stream processing; parameter tuning; Storm; Flink; Spark Streaming;
D O I
10.1109/ICDE53745.2022.00296
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Distributed data stream processing systems (DSPSs) such as Storm, Flink, and Spark Streaming are now routinely used to process continuous data streams in (near) real-time. However, achieving the low latency and high throughput demanded by today's streaming applications can be a daunting task, especially since the performance of DSPSs highly depends on a large number of system parameters that control load balancing, degree of parallelism, buffer sizes, and various other aspects of system execution. This tutorial offers a comprehensive review of the state-of-the-art automatic performance tuning approaches that have been proposed in recent years. The approaches are organized into five main categories based on their methodologies and features: cost modeling, simulation-based, experiment-driven, machine learning, and adaptive tuning. The categories of approaches will be analyzed in depth and compared to each other, exposing their various strengths and weaknesses. Finally, we will identify several open research problems and challenges related to automatic performance tuning for DSPSs.
引用
收藏
页码:3194 / 3197
页数:4
相关论文
共 50 条
  • [41] RIoTBench: An IoT benchmark for distributed stream processing systems
    Shukla, Anshu
    Chaturvedi, Shilpa
    Simmhan, Yogesh
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2017, 29 (21):
  • [42] Poster: Iterative Scheduling for Distributed Stream Processing Systems
    Eskandari, Leila
    Mair, Jason
    Huang, Zhiyi
    Eyers, David
    DEBS'18: PROCEEDINGS OF THE 12TH ACM INTERNATIONAL CONFERENCE ON DISTRIBUTED AND EVENT-BASED SYSTEMS, 2018, : 234 - 237
  • [43] Minimizing Overheads of Checkpoints in Distributed Stream Processing Systems
    Akber, Syed Muhammad Abrar
    Chen, Hanhua
    Wang, Yonghui
    Jin, Hai
    2018 IEEE 7TH INTERNATIONAL CONFERENCE ON CLOUD NETWORKING (CLOUDNET), 2018,
  • [44] Prompt: Dynamic Data-Partitioning for Distributed Micro-batch Stream Processing Systems
    Abdelhamid, Ahmed S.
    Mahmood, Ahmed R.
    Daghistani, Anas
    Aref, Walid G.
    SIGMOD'20: PROCEEDINGS OF THE 2020 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2020, : 2455 - 2469
  • [45] Priority-based Resource Scheduling in Distributed Stream Processing Systems for Big Data Applications
    Bellavista, Paolo
    Corradi, Antonio
    Reale, Andrea
    Ticca, Nicola
    2014 IEEE/ACM 7TH INTERNATIONAL CONFERENCE ON UTILITY AND CLOUD COMPUTING (UCC), 2014, : 363 - 370
  • [46] Conceptual Survey on Data Stream Processing Systems
    Hesse, Guenter
    Lorenz, Martin
    2015 IEEE 21ST INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2015, : 797 - 802
  • [47] On the Cost of Acking in Data Stream Processing Systems
    Pagliari, Alessio
    Huet, Fabrice
    Urvoy-Keller, Guillaume
    2019 19TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2019, : 331 - 340
  • [48] Exploring the Impact of Processing Guarantees on Performance of Stream Data Processing
    Akber, Syed Muhammad Abrar
    Lin, Changfu
    Chen, Hanhua
    Zhang, Fan
    Jin, Hai
    2017 17TH IEEE INTERNATIONAL CONFERENCE ON COMMUNICATION TECHNOLOGY (ICCT 2017), 2017, : 1286 - 1290
  • [49] A Predictive Scheduling Framework for Fast and Distributed Stream Data Processing
    Li, Teng
    Tang, Jian
    Xu, Jielong
    PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, : 333 - 338
  • [50] Efficient Operator Placement for Distributed Data Stream Processing Applications
    Nardelli, Matteo
    Cardellini, Valeria
    Grassi, Vincenzo
    Lo Presti, Francesco
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2019, 30 (08) : 1753 - 1767