Predictive modelling of MapReduce job performance in cloud environments using machine learning techniques

被引:0
|
作者
Bergui, Mohammed [1 ]
Hourri, Soufiane [1 ,2 ]
Najah, Said [1 ]
Nikolov, Nikola S. [3 ]
机构
[1] Univ Sidi Mohammed Ben Abdellah, Fac Sci & Technol, Dept Comp Sci, Lab Intelligent Syst & Applicat, Fes, Morocco
[2] Univ Cadi Ayyad, Higher Sch Technol, Lab Proc Ind Signals & Comp Sci, Safi, Morocco
[3] Univ Limerick, Dept Comp Sci & Informat Syst, Limerick, Ireland
关键词
Hadoop; MapReduce; Big data; Performance modelling; Runtime prediction; Machine learning;
D O I
10.1186/s40537-024-00964-z
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Within the Hadoop ecosystem, MapReduce stands as a cornerstone for managing, processing, and mining large-scale datasets. Yet, the absence of efficient solutions for precise estimation of job execution times poses a persistent challenge, impacting task allocation and distribution within Hadoop clusters. In this study, we present a comprehensive machine learning approach for predicting the execution time of MapReduce jobs, encompassing data collection, preprocessing, feature engineering, and model evaluation. Leveraging a rich dataset derived from comprehensive Hadoop MapReduce job traces, we explore the intricate relationship between cluster parameters and job performance. Through a comparative analysis of machine learning models, including linear regression, decision tree, random forest, and gradient-boosted regression trees, we identify the random forest model as the most effective, demonstrating superior predictive accuracy and robustness. Our findings underscore the critical role of features such as data size and resource allocation in determining job performance. With this work, we aim to enhance resource management efficiency and enable more effective utilisation of cloud-based Hadoop clusters for large-scale data processing tasks.
引用
收藏
页数:20
相关论文
共 50 条
  • [31] Healthcare predictive analytics using machine learning and deep learning techniques: a survey
    Mohammed Badawy
    Nagy Ramadan
    Hesham Ahmed Hefny
    Journal of Electrical Systems and Information Technology, 10 (1)
  • [32] Predictive modelling of transport decisions and resources optimisation in pre-hospital setting using machine learning techniques
    Farhat, Hassan
    Makhlouf, Ahmed
    Gangaram, Padarath
    El Aifa, Kawther
    Howland, Ian
    Rekik, Fatma Babay Ep
    Abid, Cyrine
    Khenissi, Mohamed Chaker
    Castle, Nicholas
    Al-Shaikh, Loua
    Khadhraoui, Moncef
    Gargouri, Imed
    Laughton, James
    Alinier, Guillaume
    PLOS ONE, 2024, 19 (05):
  • [33] Temporal Dynamics and Predictive Modelling of Streamflow and Water Quality Using Advanced Statistical and Ensemble Machine Learning Techniques
    Farzana, Syeda Zehan
    Paudyal, Dev Raj
    Chadalavada, Sreeni
    Alam, Md Jahangir
    WATER, 2024, 16 (15)
  • [34] Advanced predictive modelling of electrical resistivity for geotechnical and geo-environmental applications using machine learning techniques
    Kundu, Soumitra Kumar
    Dey, Ashim Kanti
    Sapkota, Sanjog Chhetri
    Debnath, Prasenjit
    Saha, Prasenjit
    Ray, Arunava
    Khandelwal, Manoj
    JOURNAL OF APPLIED GEOPHYSICS, 2024, 231
  • [35] MapReduce Tuning to Improve Distributed Machine Learning Performance
    Jeon, SungHwan
    Chung, Haejin
    Choi, Wonseok
    Shin, Heeseong
    Chun, Jonghoon
    Kim, Jin Taek
    Nah, Yunmook
    2018 IEEE FIRST INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND KNOWLEDGE ENGINEERING (AIKE), 2018, : 198 - 200
  • [36] MapReduce optimization algorithm based on machine learning in heterogeneous cloud environment
    LIN Wen-hui
    LEI Zhen-ming
    LIU Jun
    YANG Jie
    LIU Fang
    HE Gang
    WANG Qin
    The Journal of China Universities of Posts and Telecommunications, 2013, (06) : 77 - 87
  • [37] MapReduce optimization algorithm based on machine learning in heterogeneous cloud environment
    LIN Wenhui
    LEI Zhenming
    LIU Jun
    YANG Jie
    LIU Fang
    HE Gang
    WANG Qin
    TheJournalofChinaUniversitiesofPostsandTelecommunications, 2013, 20 (06) : 77 - 87+121
  • [38] Detection of Malicious Cloud Bandwidth Consumption in Cloud Computing Using Machine Learning Techniques
    Veeraiah, Duggineni
    Mohanty, Rajanikanta
    Kundu, Shakti
    Dhabliya, Dharmesh
    Tiwari, Mohit
    Jamal, Sajjad Shaukat
    Halifa, Awal
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
  • [39] Analysis of Job Failure and Prediction Model for Cloud Computing Using Machine Learning
    Jassas, Mohammad S.
    Mahmoud, Qusay H.
    SENSORS, 2022, 22 (05)
  • [40] Proficient job scheduling in cloud computation using an optimized machine learning strategy
    Neelakantan P.
    Yadav N.S.
    International Journal of Information Technology, 2023, 15 (5) : 2409 - 2421