A Hybrid Machine Learning Approach for Performance Modeling of Cloud-Based Big Data Applications

被引:2
|
作者
Ataie, Ehsan [1 ,2 ]
Evangelinou, Athanasia [3 ]
Gianniti, Eugenio [3 ]
Ardagna, Danilo [3 ]
机构
[1] Univ Mazandaran, Dept Comp Engn, Babolsar, Iran
[2] Univ Mazandaran, Distributed Comp Syst Res Grp, Babolsar, Iran
[3] Politecn Milan, Dipartimento Elettron Informaz & Bioingn, Milan, Italy
来源
COMPUTER JOURNAL | 2022年 / 65卷 / 12期
关键词
analytical performance modeling; machine learning; cloud computing; MapReduce; Hadoop; Tez; Spark;
D O I
10.1093/comjnl/bxab131
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Nowadays, Apache Hadoop and Apache Spark are two of the most prominent distributed solutions for processing big data applications on the market. Since in many cases these frameworks are adopted to support business critical activities, it is often important to predict with fair confidence the execution time of submitted applications, for instance when service-level agreements are established with end-users. In this work, we propose and validate a hybrid approach for the performance prediction of big data applications running on clouds, which exploits both analytical modeling and machine learning (ML) techniques and it is able to achieve a good accuracy without too many time consuming and costly experiments on a real setup. The experimental results show how the proposed approach attains improvement in accuracy, number of experiments to be run on the operational system and cost over applying ML techniques without any support from analytical models. Moreover, we compare our approach with Ernest, an ML-based technique proposed in the literature by the Spark inventors. Experiments show that Ernest can accurately estimate the performance in interpolating scenarios while it fails to predict the performance when configurations with increasing number of cores are considered. Finally, a comparison with a similar hybrid approach proposed in the literature demonstrates how our approach significantly reduce prediction errors especially when few experiments on the real system are performed.
引用
收藏
页码:3123 / 3140
页数:18
相关论文
共 50 条
  • [1] A Hybrid Machine Learning Approach for Performance Modeling of Cloud-Based Big Data Applications (Sept, 10.1093/comjnl/bxab131, 2021)
    Ataie, Ehsan
    Evangelinou, Athanasia
    Gianniti, Eugenio
    Ardagna, Danilo
    COMPUTER JOURNAL, 2023, 66 (02): : 524 - 524
  • [2] Cloud-based Machine Learning Tools for Enhanced Big Data Applications
    Cuzzocrea, Alfredo
    Mumolo, Enzo
    Corona, Pietro
    2015 15TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING, 2015, : 908 - 914
  • [3] Erratum to: A Hybrid Machine Learning Approach for Performance Modeling of Cloud-Based Big Data Applications (The Computer Journal DOI: 10.1093/comjnl/bxab131)
    Ataie, Ehsan
    Evangelinou, Athanasia
    Gianniti, Eugenio
    Ardagna, Danilo
    Computer Journal, 2023, 66 (02):
  • [4] Performance Prediction of Cloud-Based Big Data Applications
    Ardagna, Danilo
    Barbierato, Enrico
    Evangelinou, Athanasia
    Gianniti, Eugenio
    Gribaudo, Marco
    Pinto, Tulio B. M.
    Guimaraes, Anna
    da Silva, Ana Paula Couto
    Almeida, Jussara M.
    PROCEEDINGS OF THE 2018 ACM/SPEC INTERNATIONAL CONFERENCE ON PERFORMANCE ENGINEERING (ICPE '18), 2018, : 192 - 199
  • [5] Perspectives on Big Data, Cloud-Based Data Analysis and Machine Learning Systems
    Marozzo, Fabrizio
    Talia, Domenico
    BIG DATA AND COGNITIVE COMPUTING, 2023, 7 (02)
  • [6] Memory Scaling of Cloud-Based Big Data Systems: A Hybrid Approach
    Wang, Xinying
    Xu, Cong
    Wang, Ke
    Yan, Feng
    Zhao, Dongfang
    IEEE TRANSACTIONS ON BIG DATA, 2022, 8 (05) : 1259 - 1272
  • [7] Performance-Aware Refactoring of Cloud-based Big Data Applications
    Li, Chen
    Casale, Giuliano
    PROCEEDINGS 2017 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE (CSCI), 2017, : 1505 - 1510
  • [8] A Cloud-Based Framework for Machine Learning Workloads and Applications
    Lopez Garcia, Alvaro
    Marco De Lucas, Jesus
    Antonacci, Marica
    Zu Castell, Wolfgang
    David, Mario
    Hardt, Marcus
    Lloret Iglesias, Lara
    Molto, German
    Plociennik, Marcin
    Viet Tran
    Alic, Andy S.
    Caballer, Miguel
    Campos Plasencia, Isabel
    Costantini, Alessandro
    Dlugolinsky, Stefan
    Duma, Doina Cristina
    Donvito, Giacinto
    Gomes, Jorge
    Heredia Cacha, Ignacio
    Ito, Keiichi
    Kozlov, Valentin Y.
    Giang Nguyen
    Orviz Fernandez, Pablo
    SUstr, Zdenek
    Wolniewicz, Pawel
    IEEE ACCESS, 2020, 8 (08): : 18681 - 18692
  • [9] Performance prediction of parallel computing models to analyze cloud-based big data applications
    Shen, Chao
    Tong, Weiqin
    Choo, Kim-Kwang Raymond
    Kausar, Samina
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2018, 21 (02): : 1439 - 1454
  • [10] Performance prediction of parallel computing models to analyze cloud-based big data applications
    Chao Shen
    Weiqin Tong
    Kim-Kwang Raymond Choo
    Samina Kausar
    Cluster Computing, 2018, 21 : 1439 - 1454