Simulation and optimization of HPC job allocation for jointly reducing communication and cooling costs

被引:16
|
作者
Meng, Jie [1 ]
McCauley, Samuel [2 ]
Kaplan, Fulya [1 ]
Leung, Vitus J. [3 ]
Coskun, Ayse K. [1 ]
机构
[1] Boston Univ, Dept Elect & Comp Engn, 8 St Marys St, Boston, MA 02215 USA
[2] SUNY Stony Brook, Dept Comp Sci, Stony Brook, NY 11794 USA
[3] Sandia Natl Labs, Albuquerque, NM 87185 USA
关键词
High-performance computing; Data center; Job allocation; Joint optimization; Cooling energy; Communication cost; PROCESSOR ALLOCATION;
D O I
10.1016/j.suscom.2014.05.002
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Performance and energy are critical aspects in high performance computing (HPC) data centers. Highly parallel HPC applications that require multiple nodes usually run for long durations in the range of minutes, hours or days. As the threads of parallel applications communicate with each other intensively, the communication cost of these applications has a significant impact on data center performance. Energy consumption has also become a first -order constraint of HPC data centers. Nearly half of the energy in the computing clusters today is consumed by the cooling infrastructure. Existing job allocation policies either target improving the system performance or reducing the cooling energy cost of the server nodes. How to optimize the system performance while minimizing the cooling energy consumption is still an open question. This paper proposes a job allocation methodology aimed at jointly reducing the communication cost and the cooling energy of HPC data centers. In order to evaluate and validate our optimization algorithm, we implement our joint job allocation methodology in the structural simulation toolkit (SST) - a simulation framework for large-scale data centers. We evaluate our joint optimization algorithm using traces extracted from real-world workloads. Experimental results show that, in comparison to performance-aware job allocation algorithms, our algorithm achieves comparable running times and reduces the cooling power by up to 42.21% across all the jobs. (C) 2014 Elsevier Inc. All rights reserved.
引用
收藏
页码:48 / 57
页数:10
相关论文
共 4 条
  • [1] Optimizing Communication and Cooling Costs in HPC Data Centers via Intelligent Job Allocation
    Kaplan, Fulya
    Meng, Jie
    Coskun, Ayse K.
    [J]. 2013 INTERNATIONAL GREEN COMPUTING CONFERENCE (IGCC), 2013,
  • [2] Cooling-Aware Job Scheduling and Node Allocation for Overprovisioned HPC Systems
    Cao, Thang
    Huang, Wei
    He, Yuan
    Kondo, Masaaki
    [J]. 2017 31ST IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2017, : 728 - 737
  • [3] Communication and cooling aware job allocation in data centers for communication-intensive workloads
    Meng, Jie
    Llamosi, Eduard
    Kaplan, Fulya
    Zhang, Chulian
    Sheng, Jiayi
    Herbordt, Martin
    Schirner, Gunar
    Coskun, Ayse K.
    [J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2016, 96 : 181 - 193
  • [4] Simulation-Based Optimization of Communication Protocols for Reducing Delays during Nuclear Power Plant Outages
    Sun, Zhe
    Zhang, Cheng
    Tang, Pingbo
    [J]. CONSTRUCTION RESEARCH CONGRESS 2018: INFRASTRUCTURE AND FACILITY MANAGEMENT, 2018, : 455 - 464