Predictive modelling of MapReduce job performance in cloud environments using machine learning techniques

被引:0
|
作者
Bergui, Mohammed [1 ]
Hourri, Soufiane [1 ,2 ]
Najah, Said [1 ]
Nikolov, Nikola S. [3 ]
机构
[1] Univ Sidi Mohammed Ben Abdellah, Fac Sci & Technol, Dept Comp Sci, Lab Intelligent Syst & Applicat, Fes, Morocco
[2] Univ Cadi Ayyad, Higher Sch Technol, Lab Proc Ind Signals & Comp Sci, Safi, Morocco
[3] Univ Limerick, Dept Comp Sci & Informat Syst, Limerick, Ireland
关键词
Hadoop; MapReduce; Big data; Performance modelling; Runtime prediction; Machine learning;
D O I
10.1186/s40537-024-00964-z
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Within the Hadoop ecosystem, MapReduce stands as a cornerstone for managing, processing, and mining large-scale datasets. Yet, the absence of efficient solutions for precise estimation of job execution times poses a persistent challenge, impacting task allocation and distribution within Hadoop clusters. In this study, we present a comprehensive machine learning approach for predicting the execution time of MapReduce jobs, encompassing data collection, preprocessing, feature engineering, and model evaluation. Leveraging a rich dataset derived from comprehensive Hadoop MapReduce job traces, we explore the intricate relationship between cluster parameters and job performance. Through a comparative analysis of machine learning models, including linear regression, decision tree, random forest, and gradient-boosted regression trees, we identify the random forest model as the most effective, demonstrating superior predictive accuracy and robustness. Our findings underscore the critical role of features such as data size and resource allocation in determining job performance. With this work, we aim to enhance resource management efficiency and enable more effective utilisation of cloud-based Hadoop clusters for large-scale data processing tasks.
引用
收藏
页数:20
相关论文
共 50 条
  • [41] MapReduce optimization algorithm based on machine learning in heterogeneous cloud environment
    Lin, Wen-Hui
    Lei, Zhen-Ming
    Liu, Jun
    Yang, Jie
    Liu, Fang
    He, Gang
    Wang, Qin
    Journal of China Universities of Posts and Telecommunications, 2013, 20 (06): : 77 - 87
  • [42] Optimizing Cost and Performance Trade-Offs for MapReduce Job Processing in the Cloud
    Zhang, Zhuoyao
    Cherkasova, Ludmila
    Loo, Boon Thau
    2014 IEEE NETWORK OPERATIONS AND MANAGEMENT SYMPOSIUM (NOMS), 2014,
  • [43] Predictive modelling of hospital readmission: Evaluation of different preprocessing techniques on machine learning classifiers
    Miswan, Nor Hamizah
    Chan, Chee Seng
    Ng, Chong Guan
    INTELLIGENT DATA ANALYSIS, 2021, 25 (05) : 1073 - 1098
  • [44] INTRUSION DETECTION TECHNIQUES PERFORMANCE IN CLOUD ENVIRONMENTS
    Sabahi, Farzad
    PROCEEDINGS OF THE 2011 3RD INTERNATIONAL CONFERENCE ON SOFTWARE TECHNOLOGY AND ENGINEERING (ICSTE 2011), 2011, : 431 - 435
  • [45] ENHANCING WELDING QUALITY THROUGH PREDICTIVE MODELLING - INSIGHTS FROM MACHINE LEARNING TECHNIQUES
    Kalita, Kanak
    Ghadai, Ranjan Kumar
    Cep, Robert
    Jangir, Pradeep
    MM SCIENCE JOURNAL, 2024, 2024 : 7900 - 7905
  • [46] Predictive analysis of COVID 19 disease based on mathematical modelling and machine learning techniques
    Perepi, Rajarajeswari
    Santhi, K.
    Saraswathi, R.
    Beg, O. Anwar
    COMPUTER METHODS IN BIOMECHANICS AND BIOMEDICAL ENGINEERING-IMAGING AND VISUALIZATION, 2023, 11 (03): : 933 - 941
  • [47] New Approach to Enhancing Student Performance Prediction Using Machine Learning Techniques and Clickstream Data in Virtual Learning Environments
    Zakaria Khoudi
    Nasereddine Hafidi
    Mourad Nachaoui
    Soufiane Lyaqini
    SN Computer Science, 6 (2)
  • [48] Predictive modelling and analytics of students' grades using machine learning algorithms
    Badal, Yudish Teshal
    Sungkur, Roopesh Kevin
    EDUCATION AND INFORMATION TECHNOLOGIES, 2023, 28 (03) : 3027 - 3057
  • [49] Predictive modelling and analytics of students’ grades using machine learning algorithms
    Yudish Teshal Badal
    Roopesh Kevin Sungkur
    Education and Information Technologies, 2023, 28 : 3027 - 3057
  • [50] Predicting performance of swimmers using machine learning techniques
    Guerra-Salcedo, Cesar M.
    Janek, Libor
    Perez-Ortega, Joaquin
    Pazos-Rangel, Rodolfo A.
    WMSCI 2005: 9th World Multi-Conference on Systemics, Cybernetics and Informatics, Vol 3, 2005, : 146 - 148