Optimization power consumption model of reliability-aware GPU clusters

被引:0
|
作者
Haifeng Wang
Qingkui Chen
机构
[1] University of Shanghai for Science and Technology Shanghai,School of Management
[2] LinYi University,Information School
[3] University of Shanghai for Science and Technology,School of Optical
来源
关键词
Power consumption optimization; Reliability; GPU clusters; Model prediction control;
D O I
暂无
中图分类号
学科分类号
摘要
Power controlling on reliability-aware GPU clusters with dynamically variable voltage and speed is investigated as combinatorial optimization problem, namely the problem of minimizing task execution time with energy consumption constraint and the problem of minimizing energy consumption with system reliability constraint. The two problems have applied in general multiprocessor computing and real-time multiprocessing systems where energy consumption and system reliability both are important. These problems which emphasize the trade-off among performance, power and reliability have not been well studied before. In this research, a novel power control model is built based on Model Prediction Control theory. Maximum Entropy Method is used to determine partial ordering relation of control variable and to identify the quality of solutions. Our controller can cap the redundant energy consumption by dynamically transforming energy states of the nodes in GPU cluster. We compare our controller with the control scheme, which does not consider the system reliability. The experimental results demonstrate that the proposed controller is more reliable and valuable.
引用
下载
收藏
页码:153 / 174
页数:21
相关论文
共 50 条
  • [21] Reliability-Aware Optimization for the Sidelobe Level of Leaky-Wave Antennas
    Nguyen-Trong, N.
    Fumeaux, C.
    Kouassi, A.
    Lallechere, S.
    Bonnet, P.
    PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON ELECTROMAGNETICS IN ADVANCED APPLICATIONS (ICEAA), 2016, : 708 - 711
  • [22] REL-MOS-A Reliability-Aware MOS Transistor Model
    Hillebrand, Theodor
    Paul, Steffen
    Peters-Drolshagen, Dagmar
    IEEE TRANSACTIONS ON ELECTRON DEVICES, 2019, 66 (01) : 60 - 65
  • [23] Reliability-Aware Design to Suppress Aging
    Amrouch, Hussam
    Khaleghi, Behnam
    Gerstlauer, Andreas
    Henkel, Joerg
    2016 ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2016,
  • [24] The case for lifetime reliability-aware microprocessors
    Srinivasan, J
    Adve, SV
    Bose, P
    Rivers, JA
    31ST ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, PROCEEDINGS, 2004, : 276 - 287
  • [25] Reliability-aware probabilistic reserve procurement
    Herre, Lars
    Pinson, Pierre
    Chatzivasileiadis, Spyros
    ELECTRIC POWER SYSTEMS RESEARCH, 2022, 212
  • [26] Lifetime Reliability-Aware Digital Synthesis
    Duan, Shengyu
    Zwolinski, Mark
    Halak, Basel
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2018, 26 (11) : 2205 - 2216
  • [27] Instruction Scheduling for Reliability-Aware Compilation
    Rehman, Semeen
    Shafique, Muhammad
    Henkel, Joerg
    2012 49TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2012, : 1288 - 1296
  • [28] A Reliability-aware Approach for an Optimal Checkpoint/Restart Model in HPC Environments
    Liu, Yudan
    Nassar, Raja
    Leangsuksun, Chockchai
    Naksinehaboon, Nichamon
    Paun, Mihaela
    Scott, Stephen
    2007 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING, 2007, : 452 - +
  • [29] Reliability-Aware Statistical BSIM Compact Model Parameter Generation Methodology
    Ding, Jie
    Asenov, Asen
    IEEE TRANSACTIONS ON ELECTRON DEVICES, 2020, 67 (11) : 4777 - 4783
  • [30] Reliability-aware Approach: An Incremental Checkpoint/Restart Model in HPC Environments
    Naksinehaboon, Nichamon
    Liu, Yudan
    Leangsuksun, Chokchai
    Nassar, Raja
    Paun, Mihaela
    Scott, Stephen L.
    CCGRID 2008: EIGHTH IEEE INTERNATIONAL SYMPOSIUM ON CLUSTER COMPUTING AND THE GRID, VOLS 1 AND 2, PROCEEDINGS, 2008, : 783 - +