Prediction of job characteristics for intelligent resource allocation in HPC systems: a survey and future directions

被引:0
|
作者
Zhengxiong Hou
Hong Shen
Xingshe Zhou
Jianhua Gu
Yunlan Wang
Tianhai Zhao
机构
[1] Northwestern Polytechnical University,Center for High Performance Computing, School of Computer Science
[2] Sun Yat-Sen University,School of Computer Science and Engineering
来源
关键词
high-performance computing; performance prediction; job characteristics; intelligent resource allocation; cloud computing; machine learning;
D O I
暂无
中图分类号
学科分类号
摘要
Nowadays, high-performance computing (HPC) clusters are increasingly popular. Large volumes of job logs recording many years of operation traces have been accumulated. In the same time, the HPC cloud makes it possible to access HPC services remotely. For executing applications, both HPC end-users and cloud users need to request specific resources for different workloads by themselves. As users are usually not familiar with the hardware details and software layers, as well as the performance behavior of the underlying HPC systems. It is hard for them to select optimal resource configurations in terms of performance, cost, and energy efficiency. Hence, how to provide on-demand services with intelligent resource allocation is a critical issue in the HPC community. Prediction of job characteristics plays a key role for intelligent resource allocation. This paper presents a survey of the existing work and future directions for prediction of job characteristics for intelligent resource allocation in HPC systems. We first review the existing techniques in obtaining performance and energy consumption data of jobs. Then we survey the techniques for single-objective oriented predictions on runtime, queue time, power and energy consumption, cost and optimal resource configuration for input jobs, as well as multi-objective oriented predictions. We conclude after discussing future trends, research challenges and possible solutions towards intelligent resource allocation in HPC systems.
引用
收藏
相关论文
共 50 条
  • [41] Survey on resource allocation policy and job scheduling algorithms of cloud computing1
    [J]. Chen, H. (hschen@xmu.edu.cn), 1600, Academy Publisher (08):
  • [42] Information systems for small businesses: A survey and future research directions
    Hsu, LY
    Lo, WA
    [J]. DECISION SCIENCES INSTITUTE, 1997 ANNUAL MEETING, PROCEEDINGS, VOLS 1-3, 1997, : 568 - 570
  • [43] Future directions in basin and petroleum systems modeling: A survey of the community
    Curry, David J.
    [J]. AAPG BULLETIN, 2019, 103 (10) : 2285 - 2293
  • [44] Comprehensive Survey on Datasets, Models, and Future Directions in Plant Disease Prediction
    Shinde, Nirmala
    Ambhaikar, Asha
    [J]. INTERNATIONAL JOURNAL OF IMAGE AND GRAPHICS, 2024,
  • [45] Ensemble Prediction of Job Resources to Improve System Performance for Slurm-Based HPC Systems
    Tanash, Mohammed
    Yang, Huichen
    Andresen, Daniel
    Hsu, William
    [J]. PRACTICE AND EXPERIENCE IN ADVANCED RESEARCH COMPUTING 2021, PEARC 2021, 2021,
  • [46] Topology-aware Job Allocation in 3D Torus-based HPC Systems with Hard Job Priority Constraints
    Li, Kangkang
    Malawski, Maciej
    Nabrzyskil, Jarek
    [J]. INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE (ICCS 2017), 2017, 108 : 515 - 524
  • [47] Methods used for the development of neural networks for the prediction of water resource variables in river systems: Current status and future directions
    Maier, Holger R.
    Jain, Ashu
    Dandy, Graeme C.
    Sudheer, K. P.
    [J]. ENVIRONMENTAL MODELLING & SOFTWARE, 2010, 25 (08) : 891 - 909
  • [48] Reinforcement Learning-Based Intelligent Resource Allocation for Integrated VLCP Systems
    Yang, Helin
    Du, Pengfei
    Zhong, Wen-De
    Chen, Chen
    Alphones, Arokiaswami
    Zhang, Sheng
    [J]. IEEE WIRELESS COMMUNICATIONS LETTERS, 2019, 8 (04) : 1204 - 1207
  • [49] An intelligent resource allocation strategy with slicing and auction for private edge cloud systems
    Peng, Yuhuai
    Wang, Jing
    Ye, Xiongang
    Khan, Fazlullah
    Bashir, Ali Kashif
    Alshawi, Bandar
    Liu, Lei
    Omar, Marwan
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2024, 160 : 879 - 889
  • [50] Intelligent System of Limited Resource Allocation for Large-Scale Agent Systems
    Weclawski, Jakub
    Jankowski, Stanislaw
    [J]. MACHINE INTELLIGENCE AND BIG DATA IN INDUSTRY, 2016, 19 : 201 - 215