A Hybrid Machine Learning Approach for Performance Modeling of Cloud-Based Big Data Applications

被引:2
|
作者
Ataie, Ehsan [1 ,2 ]
Evangelinou, Athanasia [3 ]
Gianniti, Eugenio [3 ]
Ardagna, Danilo [3 ]
机构
[1] Univ Mazandaran, Dept Comp Engn, Babolsar, Iran
[2] Univ Mazandaran, Distributed Comp Syst Res Grp, Babolsar, Iran
[3] Politecn Milan, Dipartimento Elettron Informaz & Bioingn, Milan, Italy
来源
COMPUTER JOURNAL | 2022年 / 65卷 / 12期
关键词
analytical performance modeling; machine learning; cloud computing; MapReduce; Hadoop; Tez; Spark;
D O I
10.1093/comjnl/bxab131
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Nowadays, Apache Hadoop and Apache Spark are two of the most prominent distributed solutions for processing big data applications on the market. Since in many cases these frameworks are adopted to support business critical activities, it is often important to predict with fair confidence the execution time of submitted applications, for instance when service-level agreements are established with end-users. In this work, we propose and validate a hybrid approach for the performance prediction of big data applications running on clouds, which exploits both analytical modeling and machine learning (ML) techniques and it is able to achieve a good accuracy without too many time consuming and costly experiments on a real setup. The experimental results show how the proposed approach attains improvement in accuracy, number of experiments to be run on the operational system and cost over applying ML techniques without any support from analytical models. Moreover, we compare our approach with Ernest, an ML-based technique proposed in the literature by the Spark inventors. Experiments show that Ernest can accurately estimate the performance in interpolating scenarios while it fails to predict the performance when configurations with increasing number of cores are considered. Finally, a comparison with a similar hybrid approach proposed in the literature demonstrates how our approach significantly reduce prediction errors especially when few experiments on the real system are performed.
引用
收藏
页码:3123 / 3140
页数:18
相关论文
共 50 条
  • [41] An approach for economic evaluation of cloud-based applications
    Pena-Ortiz, Raul
    Domenech, Josep
    Gil, Jose A.
    Pont, Ana
    2014 IEEE 3RD INTERNATIONAL CONFERENCE ON CLOUD NETWORKING (CLOUDNET), 2014, : 281 - 287
  • [42] Technical and Legal Strategic Approaches Protecting the Privacy of Personal Data in Cloud-Based Big Data Applications
    Arikan, Suleyman Muhammed
    2022 10TH INTERNATIONAL SYMPOSIUM ON DIGITAL FORENSICS AND SECURITY (ISDFS), 2022,
  • [43] ACO-Inspired Load Balancing Strategy for Cloud-Based Data Centre with Predictive Machine Learning Approach
    Dey, Niladri
    Gunasekhar, T.
    Purnachand, K.
    CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 75 (01): : 513 - 529
  • [44] I/O Performance Modeling for Big Data Applications over Cloud Infrastructures
    Mytilinis, Ioannis
    Tsoumakos, Dimitrios
    Kantere, Verena
    Nanos, Anastassios
    Koziris, Nectarios
    2015 IEEE INTERNATIONAL CONFERENCE ON CLOUD ENGINEERING (IC2E 2015), 2015, : 201 - 206
  • [45] Predicting the performance of big data applications on the cloud
    Ardagna, D.
    Barbierato, E.
    Gianniti, E.
    Gribaudo, M.
    Pinto, T. B. M.
    da Silva, A. P. C.
    Almeida, J. M.
    JOURNAL OF SUPERCOMPUTING, 2021, 77 (02): : 1321 - 1353
  • [46] Predicting the performance of big data applications on the cloud
    D. Ardagna
    E. Barbierato
    E. Gianniti
    M. Gribaudo
    T. B. M. Pinto
    A. P. C. da Silva
    J. M. Almeida
    The Journal of Supercomputing, 2021, 77 : 1321 - 1353
  • [47] Towards Cloud-Based Data Warehouse as a Service for Big Data Analytics
    Dabbechi, Hichem
    Nabli, Ahlem
    Bouzguenda, Lotfi
    COMPUTATIONAL COLLECTIVE INTELLIGENCE, ICCCI 2016, PT II, 2016, 9876 : 180 - 189
  • [48] A Cloud-based Architecture for Condition Monitoring based on Machine Learning
    Arevalo, Fernando
    Diprasetya, Mochammad Rizky
    Schwung, Andreas
    2018 IEEE 16TH INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS (INDIN), 2018, : 163 - 168
  • [49] Machine learning with big data analytics for cloud security
    Mohammad, Abdul Salam
    Pradhan, Manas Ranjan
    COMPUTERS & ELECTRICAL ENGINEERING, 2021, 96
  • [50] A Performance Analysis of MapReduce Applications on Big Data in Cloud based Hadoop
    Gohil, Parth
    Garg, Dweepna
    Panchal, Bakul
    2014 INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND EMBEDDED SYSTEMS (ICICES), 2014,