A Hybrid Machine Learning Approach for Performance Modeling of Cloud-Based Big Data Applications

被引:2
|
作者
Ataie, Ehsan [1 ,2 ]
Evangelinou, Athanasia [3 ]
Gianniti, Eugenio [3 ]
Ardagna, Danilo [3 ]
机构
[1] Univ Mazandaran, Dept Comp Engn, Babolsar, Iran
[2] Univ Mazandaran, Distributed Comp Syst Res Grp, Babolsar, Iran
[3] Politecn Milan, Dipartimento Elettron Informaz & Bioingn, Milan, Italy
来源
COMPUTER JOURNAL | 2022年 / 65卷 / 12期
关键词
analytical performance modeling; machine learning; cloud computing; MapReduce; Hadoop; Tez; Spark;
D O I
10.1093/comjnl/bxab131
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Nowadays, Apache Hadoop and Apache Spark are two of the most prominent distributed solutions for processing big data applications on the market. Since in many cases these frameworks are adopted to support business critical activities, it is often important to predict with fair confidence the execution time of submitted applications, for instance when service-level agreements are established with end-users. In this work, we propose and validate a hybrid approach for the performance prediction of big data applications running on clouds, which exploits both analytical modeling and machine learning (ML) techniques and it is able to achieve a good accuracy without too many time consuming and costly experiments on a real setup. The experimental results show how the proposed approach attains improvement in accuracy, number of experiments to be run on the operational system and cost over applying ML techniques without any support from analytical models. Moreover, we compare our approach with Ernest, an ML-based technique proposed in the literature by the Spark inventors. Experiments show that Ernest can accurately estimate the performance in interpolating scenarios while it fails to predict the performance when configurations with increasing number of cores are considered. Finally, a comparison with a similar hybrid approach proposed in the literature demonstrates how our approach significantly reduce prediction errors especially when few experiments on the real system are performed.
引用
收藏
页码:3123 / 3140
页数:18
相关论文
共 50 条
  • [31] Long-Term Spectrum Monitoring with Big Data Analysis and Machine Learning for Cloud-Based Radio Access Networks
    Pavel Baltiiski
    Ilia Iliev
    Boian Kehaiov
    Vladimir Poulkov
    Todor Cooklev
    Wireless Personal Communications, 2016, 87 : 815 - 835
  • [32] Long-Term Spectrum Monitoring with Big Data Analysis and Machine Learning for Cloud-Based Radio Access Networks
    Baltiiski, Pavel
    Iliev, Ilia
    Kehaiov, Boian
    Poulkov, Vladimir
    Cooklev, Todor
    WIRELESS PERSONAL COMMUNICATIONS, 2016, 87 (03) : 815 - 835
  • [33] Strategic alignment of Cloud-based Architectures for Big Data
    Schmidt, Rainer
    Moehring, Michael
    17TH IEEE INTERNATIONAL ENTERPRISE DISTRIBUTED OBJECT COMPUTING CONFERENCE WORKSHOPS (EDOCW 2013), 2013, : 136 - 143
  • [34] Distributed and Cloud-based Big Data Analytics and Fusion
    Das, Subrata
    SIGNAL PROCESSING, SENSOR FUSION, AND TARGET RECOGNITION XXII, 2013, 8745
  • [35] Pipeline provenance for cloud-based big data analytics
    Wang, Ruoyu
    Sun, Daniel
    Li, Guoqiang
    Wong, Raymond
    Chen, Shiping
    SOFTWARE-PRACTICE & EXPERIENCE, 2020, 50 (05): : 658 - 674
  • [36] Efficient Cloud-Based Framework for Big Data Classification
    Pakdel, Rezvan
    Herbert, John
    PROCEEDINGS 2016 IEEE SECOND INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING SERVICE AND APPLICATIONS (BIGDATASERVICE 2016), 2016, : 195 - 201
  • [37] A Cloud-based Network Architecture for Big Data Services
    Zhao, Ming
    Kumar, Arun
    Ali, G. G. Md. Nawaz
    Chong, Peter Han Joo
    2016 IEEE 14TH INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, 14TH INTL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING, 2ND INTL CONF ON BIG DATA INTELLIGENCE AND COMPUTING AND CYBER SCIENCE AND TECHNOLOGY CONGRESS (DASC/PICOM/DATACOM/CYBERSC, 2016, : 654 - 659
  • [38] PhishNot: A Cloud-Based Machine-Learning Approach to Phishing URL Detection
    Alani, Mohammed M.
    Tawfik, Hissam
    COMPUTER NETWORKS, 2022, 218
  • [39] Balance Deficits due to Cerebellar Ataxia: A Machine Learning and Cloud-Based Approach
    Ngo, Thang
    Pathirana, Pubudu N.
    Horne, Malcolm K.
    Power, Laura
    Szmulewicz, David J.
    Milne, Sarah C.
    Corben, Louise A.
    Roberts, Melissa
    Delatycki, Martin B.
    IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2021, 68 (05) : 1507 - 1517
  • [40] Modeling of performance evaluation of educational information based on big data deep learning and cloud platform
    Ye, Jun
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2020, 38 (06) : 7155 - 7165