Extending SLURM for Dynamic Resource-Aware Adaptive Batch Scheduling

被引:11
|
作者
Chadha, Mohak [1 ]
John, Jophin [1 ]
Gerndt, Michael [1 ]
机构
[1] Tech Univ Munchen Garching Near Munich, Comp Architecture & Parallel Syst, Munich, Germany
关键词
Dynamic resource-management; malleability; SLURM; performance-aware; power-aware scheduling;
D O I
10.1109/HiPC50609.2020.00036
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
With the growing constraints on power budget and increasing hardware failure rates, the operation of future exascale systems faces several challenges. Towards this, resource awareness and adaptivity by enabling malleable jobs has been actively researched in the HPC community. Malleable jobs can change their computing resources at runtime and can significantly improve HPC system performance. However, due to the rigid nature of popular parallel programming paradigms such as MPI and lack of support for dynamic resource management in batch systems, malleable jobs have been largely unrealized. In this paper, we extend the SLURM batch system to support the execution and batch scheduling of malleable jobs. The malleable applications are written using a new adaptive parallel paradigm called Invasive MPI which extends the MPI standard to support resource-adaptivity at runtime. We propose two malleable job scheduling strategies to support performance-aware and power-aware dynamic reconfiguration decisions at runtime. We implement the strategies in SLURM and evaluate them on a production HPC system. Results for our performance-aware scheduling strategy show improvements in makespan, average system utilization, average response, and waiting times as compared to other scheduling strategies. Moreover, we demonstrate dynamic power corridor management using our power-aware strategy.
引用
收藏
页码:223 / 232
页数:10
相关论文
共 50 条
  • [1] Resource-Aware Adaptive Scheduling for MapReduce Clusters
    Polo, Jorda
    Castillo, Claris
    Carrera, David
    Becerra, Yolanda
    Whalley, Ian
    Steinder, Malgorzata
    Torres, Jordi
    Ayguade, Eduard
    [J]. MIDDLEWARE 2011, 2011, 7049 : 187 - +
  • [2] Resource-Aware Control and Dynamic Scheduling in CPS
    Heemels, W. P. M. H.
    [J]. CYBER PHYSICAL SYSTEMS: DESIGN, MODELING, AND EVALUATION, CYPHY 2015, 2015, 9361 : 1 - 7
  • [3] Resource-Aware Task Scheduling
    Tillenius, Martin
    Larsson, Elisabeth
    Badia, Rosa M.
    Martorell, Xavier
    [J]. ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2015, 14 (01)
  • [4] Cognitive Resource-Aware Adaptive Web Service Binding and Scheduling
    Jimenez-Molina, Angel
    Choi, Jang-Ho
    Gaete-Villegas, Jorge
    Ko, In-Young
    [J]. 2012 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY (WI-IAT 2012), VOL 1, 2012, : 338 - 345
  • [5] Resource-Aware Contracts for Addressing Feature Interaction in Dynamic Adaptive Systems
    Liu, Yu
    Meier, Rene
    [J]. ICAS: 2009 FIFTH INTERNATIONAL CONFERENCE ON AUTONOMIC AND AUTONOMOUS SYSTEMS, 2009, : 346 - 350
  • [6] Dynamic configuration of resource-aware services
    Poladian, V
    Sousa, JP
    Garlan, D
    Shaw, M
    [J]. ICSE 2004: 26TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, PROCEEDINGS, 2004, : 604 - 613
  • [7] Resource Scheduling through Resource-Aware Simulation of Emergency Departments
    Shin, Seung Yeob
    Balasubramanian, Hari
    Brun, Yuriy
    Henneman, Philip L.
    Osterweil, Leon J.
    [J]. 2013 5TH INTERNATIONAL WORKSHOP ON SOFTWARE ENGINEERING IN HEALTH CARE (SEHC), 2013, : 64 - 70
  • [8] Resource-aware parallel adaptive computation for clusters
    Teresco, JD
    Effinger-Dean, L
    Sharma, A
    [J]. COMPUTATIONAL SCIENCE - ICCS 2005, PT 2, 2005, 3515 : 107 - 115
  • [9] Adaptive and resource-aware mining of frequent sets
    Orlando, S
    Palmerini, P
    Perego, R
    Silvestri, F
    [J]. 2002 IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2002, : 338 - 345
  • [10] Efficient Resource-aware Neural Architecture Search with Dynamic Adaptive Network Sampling
    Yang, Zhao
    Sun, Qingshuang
    [J]. 2021 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2021,