EVALIX: Classification and Prediction of Job Resource Consumption on HPC Platforms

被引:6
|
作者
Emeras, Joseph [1 ]
Varrette, Sebastien [2 ]
Guzek, Mateusz [1 ]
Bouvry, Pascal [2 ]
机构
[1] Interdisciplinary Ctr Secur Reliabil & Trust, Luxembourg, Luxembourg
[2] Comp Sci & Commun CSC Res Unit, 6 Rue Richard Coudenhove Kalergi, L-1359 Luxembourg, Luxembourg
关键词
RJMS; HPC; Classification; Machine learning; ROC CURVE; KAPPA; AREA;
D O I
10.1007/978-3-319-61756-5_6
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
At the advent of a wished (or forced) convergence between High Performance Computing HPC platforms, stand-alone accelerators and virtualized resources from Cloud Computing CC systems, this article unveils the job prediction component of the Evalix project. This framework aims at an improved efficiency of the underlying Resource and Job Management System RJMS within heterogeneous HPC facilities by the automatic evaluation and characterization of the submitted workload. The objective is not only to better adapt the scheduled jobs to the available resource capabilities, but also to reduce the energy costs. For that purpose, we collected the resource consumption of all the jobs executed on a production cluster for a period of three months. Based on the analysis then on the classification of the jobs, we computed a resource consumption model. The objective is to train a set of predictors based on the aforementioned model, that will give the estimated CPU, memory and IO used by the jobs. The analysis of the resource consumption highlighted that different classes of jobs have different kinds of resource needs and the classification of the jobs enabled to characterize several application patterns of the users. We also discovered that several users whose resource usage on the cluster is considered as too low, are responsible for a loss of CPU time on the order of five years over the considered three month period. The predictors, trained from a supervised learning algorithm, were able to correctly classify a large set of data. We evaluated them with three performance indicators that gave an information retrieval rate of 71% to 89% and a probability of accurate prediction between 0.7 and 0.8. The results of this work will be particularly helpful for designing an optimal partitioning of the considered heterogeneous platform, taking into consideration the real application needs and thus leading to energy savings and performance improvements. Moreover, apart from the novelty of the contribution, the accurate classification scheme offers new insights of users behavior of interest for the design of future HPC platforms.
引用
收藏
页码:102 / 122
页数:21
相关论文
共 50 条
  • [1] Bejo: Behavior based Job Classification for Resource Consumption Prediction in the Cloud
    Xu, Lin
    Cao, Jiannong
    Wang, Yan
    Yang, Lei
    Li, Jing
    [J]. 2014 IEEE 6TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING TECHNOLOGY AND SCIENCE (CLOUDCOM), 2014, : 10 - 17
  • [2] Prediction of job characteristics for intelligent resource allocation in HPC systems: a survey and future directions
    Hou, Zhengxiong
    Shen, Hong
    Zhou, Xingshe
    Gu, Jianhua
    Wang, Yunlan
    Zhao, Tianhai
    [J]. FRONTIERS OF COMPUTER SCIENCE, 2022, 16 (05)
  • [3] Prediction of job characteristics for intelligent resource allocation in HPC systems: a survey and future directions
    Zhengxiong Hou
    Hong Shen
    Xingshe Zhou
    Jianhua Gu
    Yunlan Wang
    Tianhai Zhao
    [J]. Frontiers of Computer Science, 2022, 16
  • [4] Prediction of job characteristics for intelligent resource allocation in HPC systems:a survey and future directions
    Zhengxiong HOU
    Hong SHEN
    Xingshe ZHOU
    Jianhua GU
    Yunlan WANG
    Tianhai ZHAO
    [J]. Frontiers of Computer Science., 2022, 16 (05) - 37
  • [5] Online Job Failure Prediction in an HPC System
    Antici, Francesco
    Borghesi, Andrea
    Kiziltan, Zeynep
    [J]. EURO-PAR 2023: PARALLEL PROCESSING WORKSHOPS, PT II, EURO-PAR 2023, 2024, 14352 : 167 - 179
  • [6] Predictive Modeling for Job Power Consumption in HPC Systems
    Borghesi, Andrea
    Bartolini, Andrea
    Lombardi, Michele
    Milano, Michela
    Benini, Luca
    [J]. HIGH PERFORMANCE COMPUTING, 2016, 9697 : 181 - 199
  • [7] Parallel PPI Prediction Performance Study on HPC Platforms
    El-Moursy, Ali A.
    Afifi, Wael S.
    Sibai, Fadi N.
    Nassar, Salwa M.
    [J]. JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2015, 24 (05)
  • [8] Prediction of Energy Consumption by Checkpoint/Restart in HPC
    Moran, M.
    Balladini, I
    Rexachs, D.
    Luque, E.
    [J]. IEEE ACCESS, 2019, 7 : 71791 - 71803
  • [9] Seamless Management of Ensemble Climate Prediction Experiments on HPC Platforms
    Manubens-Gil, Domingo
    Vegas-Regidor, Javier
    Prodhomme, Chloe
    Mula-Valls, Oriol
    Doblas-Reyes, Francisco J.
    [J]. 2016 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS 2016), 2016, : 895 - 900
  • [10] Work in Progress: Topic Modeling for HPC Job State Prediction
    DeLucia, Alexandra
    Baseman, Elisabeth
    [J]. PROCEEDINGS OF THE 1ST WORKSHOP ON MACHINE LEARNING FOR COMPUTING SYSTEMS (MLCS 2018), 2018,