Predictive modelling of MapReduce job performance in cloud environments using machine learning techniques

被引:0
|
作者
Bergui, Mohammed [1 ]
Hourri, Soufiane [1 ,2 ]
Najah, Said [1 ]
Nikolov, Nikola S. [3 ]
机构
[1] Univ Sidi Mohammed Ben Abdellah, Fac Sci & Technol, Dept Comp Sci, Lab Intelligent Syst & Applicat, Fes, Morocco
[2] Univ Cadi Ayyad, Higher Sch Technol, Lab Proc Ind Signals & Comp Sci, Safi, Morocco
[3] Univ Limerick, Dept Comp Sci & Informat Syst, Limerick, Ireland
关键词
Hadoop; MapReduce; Big data; Performance modelling; Runtime prediction; Machine learning;
D O I
10.1186/s40537-024-00964-z
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Within the Hadoop ecosystem, MapReduce stands as a cornerstone for managing, processing, and mining large-scale datasets. Yet, the absence of efficient solutions for precise estimation of job execution times poses a persistent challenge, impacting task allocation and distribution within Hadoop clusters. In this study, we present a comprehensive machine learning approach for predicting the execution time of MapReduce jobs, encompassing data collection, preprocessing, feature engineering, and model evaluation. Leveraging a rich dataset derived from comprehensive Hadoop MapReduce job traces, we explore the intricate relationship between cluster parameters and job performance. Through a comparative analysis of machine learning models, including linear regression, decision tree, random forest, and gradient-boosted regression trees, we identify the random forest model as the most effective, demonstrating superior predictive accuracy and robustness. Our findings underscore the critical role of features such as data size and resource allocation in determining job performance. With this work, we aim to enhance resource management efficiency and enable more effective utilisation of cloud-based Hadoop clusters for large-scale data processing tasks.
引用
收藏
页数:20
相关论文
共 50 条
  • [1] Predictive Modelling for Concrete Failure at Anchorages Using Machine Learning Techniques
    Spyridis, Panagiotis
    Olalusi, Oladimeji B.
    MATERIALS, 2021, 14 (01) : 1 - 22
  • [2] Predictive Resource Allocation Strategies for Cloud Computing Environments Using Machine Learning
    Kamble, Torana
    Deokar, Sanjivani
    Wadne, Vinod S.
    Gadekar, Devendra P.
    Vanjari, Hrishikesh Bhanudas
    Mange, Purva
    JOURNAL OF ELECTRICAL SYSTEMS, 2023, 19 (02) : 68 - 77
  • [3] DDoS Attack Detection using Machine Learning Techniques in Cloud Computing Environments
    Zekri, Marwane
    El Kafhali, Said
    Aboutabit, Noureddine
    Saadi, Youssef
    PROCEEDINGS OF 2017 3RD INTERNATIONAL CONFERENCE OF CLOUD COMPUTING TECHNOLOGIES AND APPLICATIONS (CLOUDTECH), 2017, : 236 - 242
  • [4] Predictive Modelling of Financial Distress of Slovak Companies Using Machine Learning Techniques
    Durica, Marek
    Svabova, Lucia
    Kramarova, Katarina
    ECONOMICS, MANAGEMENT & BUSINESS 2023: CONTEMPORARY ISSUES, INSIGHTS AND NEW CHALLENGES, 2023, : 780 - 786
  • [5] Predictive Modelling of Employee Turnover in Indian IT Industry Using Machine Learning Techniques
    Khera, Shikha N.
    Divya
    VISION-THE JOURNAL OF BUSINESS PERSPECTIVE, 2019, 23 (01) : 12 - 21
  • [6] Hybrid Intrusion Detection System Using Machine Learning Techniques in Cloud Computing Environments
    Aljamal, Ibraheem
    Tekeoglu, Ali
    Bekiroglu, Korkut
    Sengupta, Saumendra
    2019 IEEE/ACIS 17TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING RESEARCH, MANAGEMENT AND APPLICATIONS (SERA), 2019, : 84 - 89
  • [7] Performance Modeling of MapReduce Jobs in Heterogeneous Cloud Environments
    Zhang, Zhuoyao
    Cherkasova, Ludmila
    Boon Thau Loo
    2013 IEEE SIXTH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (CLOUD 2013), 2013, : 839 - 846
  • [8] Scientific data processing using MapReduce in cloud environments
    Kong, Xiangsheng
    Information Technology Journal, 2013, 12 (23) : 7869 - 7873
  • [9] Improving the Performance of Secure Cloud Infrastructure With Machine Learning Techniques
    Sarma, M. Subrahmanya
    Srinivas, Y.
    Ramesh, N.
    Abhiram, M.
    2016 IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING IN EMERGING MARKETS (CCEM), 2016, : 78 - 83
  • [10] Evaporation modelling using different machine learning techniques
    Wang, Lunche
    Kisi, Ozgur
    Hu, Bo
    Bilal, Muhammad
    Zounemat-Kermani, Mohammad
    Li, Hui
    INTERNATIONAL JOURNAL OF CLIMATOLOGY, 2017, 37 : 1076 - 1092