KORDI: A Framework for Real-Time Performance and Cost Optimization of Apache Spark Streaming

被引:1
|
作者
Kordelas, Athanasios [1 ,2 ]
Spyrou, Thanasis
Voulgaris, Spyros [4 ]
Megalooikonomou, Vasileios [2 ]
Deligiannis, Nikos [1 ,3 ]
机构
[1] Vrije Univ Brussel, Dept Elect & Informat ETRO, B-1050 Brussels, Belgium
[2] Univ Patras, Comp Engn & Informat Dept, Patras 26500, Greece
[3] Imec, Kapeldreef 75, B-3001 Leuven, Belgium
[4] Athens Univ Econ & Business, Athens, Greece
关键词
Big data; Apache Spark; Machine learning for resource allocation; Real-Time forecasting for cost reduction; Energy-efficient; MODEL;
D O I
10.1109/ISPASS57527.2023.00045
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Apache Spark is one of the most commonly used frameworks for Big Data processing. Research on the provided streaming dynamic resource allocation feature, has been shown that large data load fluctuations, for instance, in website traffic, have a negative impact on the automatic scaling. Research has also indicated that the lack of data load prediction, which aims at the identification of the expected data load increase on peak hours/days, is the root cause of the aforementioned issue. Hence, this paper proposes an enhanced solution, namely, KORDI (Knowledge-based Orchestrated Resource DIstribution), aiming at optimising the allocation of Spark resources on Streaming applications in real time with the use of SARIMAX model. The experimental evaluation proves that the proposed solution provides a cost reduction of 38% without affecting stability.
引用
收藏
页码:337 / 339
页数:3
相关论文
共 50 条
  • [1] Structured Streaming: A Declarative API for Real-Time Applications in Apache Spark
    Armbrust, Michael
    Das, Tathagata
    Torres, Joseph
    Yavuz, Burak
    Zhu, Shixiong
    Xin, Reynold
    Ghodsi, Ali
    Stoica, Ion
    Zaharia, Matei
    [J]. SIGMOD'18: PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2018, : 601 - 613
  • [2] Real-Time Heart Arrhythmia Detection Using Apache Spark Structured Streaming
    Ilbeigipour, Sadegh
    Albadvi, Amir
    Akhondzadeh Noughabi, Elham
    [J]. JOURNAL OF HEALTHCARE ENGINEERING, 2021, 2021
  • [3] Real-time Data Streaming using Apache Spark on Fully Configured Hadoop Cluster
    Prasad, Kashi Sai
    Pasupathy, S.
    [J]. JOURNAL OF MECHANICS OF CONTINUA AND MATHEMATICAL SCIENCES, 2018, 13 (05): : 164 - 176
  • [4] Real-Time Regex Matching With Apache Spark
    Deaton, Sean
    Brownfield, David
    Kosta, Leonard
    Zhu, Zhaozhong
    Matthews, Suzanne J.
    [J]. 2017 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2017,
  • [5] Real-time Processing of IoT Events with Historic data using Apache Kafka and Apache Spark with Dashing framework
    D'silva, Godson Michael
    Khan, Azharuddin
    Joshi, Gaurav
    SiddheshBari
    [J]. 2017 2ND IEEE INTERNATIONAL CONFERENCE ON RECENT TRENDS IN ELECTRONICS, INFORMATION & COMMUNICATION TECHNOLOGY (RTEICT), 2017, : 1804 - 1809
  • [6] Real-time incremental recommendation for streaming data based on apache flink
    Tang, Zhuo
    Liu, Zeyu
    Li, Kenli
    Li, Keqin
    [J]. INTELLIGENT DATA ANALYSIS, 2019, 23 (06) : 1421 - 1437
  • [7] Stock Market Real Time Recommender Model Using Apache Spark Framework
    Seif, Mostafa Mohamed
    Hamed, Essam M. Ramzy
    Hegazy, Abd El Fatah Abdel Ghfar
    [J]. INTERNATIONAL CONFERENCE ON ADVANCED MACHINE LEARNING TECHNOLOGIES AND APPLICATIONS (AMLTA2018), 2018, 723 : 671 - 683
  • [8] A spark-based big data analysis framework for real-time sentiment prediction on streaming data
    Kilinc, Deniz
    [J]. SOFTWARE-PRACTICE & EXPERIENCE, 2019, 49 (09): : 1352 - 1364
  • [9] A Novel Real-Time LiDAR Data Streaming Framework
    Anand, Bhaskar
    Kambhampaty, Harish Rohan
    Rajalakshmi, Pachamuthu
    [J]. IEEE SENSORS JOURNAL, 2022, 22 (23) : 23476 - 23485
  • [10] Real-time user clickstream behavior analysis based on apache storm streaming
    Pal, Gautam
    Atkinson, Katie
    Li, Gangmin
    [J]. ELECTRONIC COMMERCE RESEARCH, 2023, 23 (03) : 1829 - 1859