Evaluation of Load Prediction Techniques for Distributed Stream Processing

被引:4
|
作者
Gontarska, Kordian [1 ,2 ]
Geldenhuys, Morgan [2 ]
Scheinert, Dominik [2 ]
Wiesner, Philipp [2 ]
Polze, Andreas [1 ]
Thamsen, Lauritz [2 ]
机构
[1] Univ Potsdam, Hasso Plattner Inst, Potsdam, Germany
[2] Tech Univ Berlin, Berlin, Germany
关键词
Distributed Stream Processing; Resource Management and Optimization; Load Prediction; Time Series Forecasting; Machine Learning;
D O I
10.1109/IC2E52221.2021.00023
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Distributed Stream Processing (DSP) systems enable processing large streams of continuous data to produce results in near to real time. They are an essential part of many data-intensive applications and analytics platforms. The rate at which events arrive at DSP systems can vary considerably over time, which may be due to trends, cyclic, and seasonal patterns within the data streams. A priori knowledge of incoming workloads enables proactive approaches to resource management and optimization tasks such as dynamic scaling, live migration of resources, and the tuning of configuration parameters during run-times, thus leading to a potentially better Quality of Service. In this paper we conduct a comprehensive evaluation of different load prediction techniques for DSP jobs. We identify three use-cases and formulate requirements for making load predictions specific to DSP jobs. Automatically optimized classical and Deep Learning methods are being evaluated on nine different datasets from typical DSP domains, i.e. the IoT, Web 2.0, and cluster monitoring. We compare model performance with respect to overall accuracy and training duration. Our results show that the Deep Learning methods provide the most accurate load predictions for the majority of the evaluated datasets.
引用
收藏
页码:91 / 98
页数:8
相关论文
共 50 条
  • [21] Distributed Stream Processing with DUP
    Bader, Kai Christian
    Eissler, Tilo
    Evans, Nathan
    GauthierDickey, Chris
    Grothoff, Christian
    Grothoff, Krista
    Keene, Jeff
    Meier, Harald
    Ritzdorf, Craig
    Rutherford, Matthew J.
    NETWORK AND PARALLEL COMPUTING, 2010, 6289 : 232 - +
  • [22] Reliable stream data processing for elastic distributed stream processing systems
    Xiaohui Wei
    Yuan Zhuang
    Hongliang Li
    Zhiliang Liu
    Cluster Computing, 2020, 23 : 555 - 574
  • [23] Evaluation of distributed stream processing frameworks for IoT applications in Smart Cities
    Hamid Nasiri
    Saeed Nasehi
    Maziar Goudarzi
    Journal of Big Data, 6
  • [24] Evaluation of distributed stream processing frameworks for IoT applications in Smart Cities
    Nasiri, Hamid
    Nasehi, Saeed
    Goudarzi, Maziar
    JOURNAL OF BIG DATA, 2019, 6 (01)
  • [25] Reliable stream data processing for elastic distributed stream processing systems
    Wei, Xiaohui
    Zhuang, Yuan
    Li, Hongliang
    Liu, Zhiliang
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2020, 23 (02): : 555 - 574
  • [26] From a Stream of Relational Queries to Distributed Stream Processing
    Zou, Qiong
    Wang, Huayong
    Soule, Robert
    Hirzel, Martin
    Andrade, Henrique
    Gedik, Bugra
    Wu, Kun-Lung
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2010, 3 (02): : 1394 - 1405
  • [27] Bounding substreams in distributed stream processing
    Trofimov, Artem
    Sokolov, Nikita
    Marshalkin, Nikita
    Kuralenok, Igor
    Novikov, Boris
    INFORMATION SYSTEMS, 2023, 117
  • [28] Scalable Distributed Stream Join Processing
    Lin, Qian
    Ooi, Beng Chin
    Wang, Zhengkui
    Yu, Cui
    SIGMOD'15: PROCEEDINGS OF THE 2015 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2015, : 811 - 825
  • [29] Smart Distributed DataSets for Stream Processing
    Lopes, Tiago
    Coimbra, Miguel
    Veiga, Luis
    EURO-PAR 2021: PARALLEL PROCESSING, 2021, 12820 : 249 - 265
  • [30] Task Allocation for Distributed Stream Processing
    Eidenbenz, Raphael
    Locher, Thomas
    IEEE INFOCOM 2016 - THE 35TH ANNUAL IEEE INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATIONS, 2016,