Optimization power consumption model of reliability-aware GPU clusters

被引:0
|
作者
Haifeng Wang
Qingkui Chen
机构
[1] University of Shanghai for Science and Technology Shanghai,School of Management
[2] LinYi University,Information School
[3] University of Shanghai for Science and Technology,School of Optical
来源
关键词
Power consumption optimization; Reliability; GPU clusters; Model prediction control;
D O I
暂无
中图分类号
学科分类号
摘要
Power controlling on reliability-aware GPU clusters with dynamically variable voltage and speed is investigated as combinatorial optimization problem, namely the problem of minimizing task execution time with energy consumption constraint and the problem of minimizing energy consumption with system reliability constraint. The two problems have applied in general multiprocessor computing and real-time multiprocessing systems where energy consumption and system reliability both are important. These problems which emphasize the trade-off among performance, power and reliability have not been well studied before. In this research, a novel power control model is built based on Model Prediction Control theory. Maximum Entropy Method is used to determine partial ordering relation of control variable and to identify the quality of solutions. Our controller can cap the redundant energy consumption by dynamically transforming energy states of the nodes in GPU cluster. We compare our controller with the control scheme, which does not consider the system reliability. The experimental results demonstrate that the proposed controller is more reliable and valuable.
引用
下载
收藏
页码:153 / 174
页数:21
相关论文
共 50 条
  • [31] Reliability-Aware Power Adjustment in Air-Soil Wireless Sensor Networks
    Fang Xiaolin
    Gao Hong
    Li Jianzhong
    AD HOC & SENSOR WIRELESS NETWORKS, 2013, 18 (3-4) : 203 - 223
  • [32] A reliability-aware RF power amplifier design for CMOS radio chip integration
    Ruberto, Mark
    Degani, Ofir
    Wail, Shay
    Tendler, Alex
    Fridman, Amir
    Goltman, Germady
    2008 IEEE INTERNATIONAL RELIABILITY PHYSICS SYMPOSIUM PROCEEDINGS - 46TH ANNUAL, 2008, : 536 - 540
  • [33] Global Reliability-Aware Power Management for Multiprocessor Real-Time Systems
    Qi, Xuan
    Zhu, Dakai
    Aydin, Hakan
    16TH IEEE INTERNATIONAL CONFERENCE ON EMBEDDED AND REAL-TIME COMPUTING SYSTEMS AND APPLICATIONS (RTCSA 2010), 2010, : 183 - 192
  • [34] Reliability-aware techno-economic assessment of floating solar power systems
    Goswami, Anik
    Aizpurua, Jose I.
    Sustainable Energy, Grids and Networks, 2024, 40
  • [35] Reliability-Aware Requirements Development for Autonomy Software
    Meshkat, Leila
    Magnusson, Gudjon
    Diep, Madeline
    Lindvall, Mikael
    2022 68TH ANNUAL RELIABILITY AND MAINTAINABILITY SYMPOSIUM (RAMS 2022), 2022,
  • [36] Reliability-Aware Resource Allocation in HPC Systems
    Gottumukkala, Narasimha Raju
    Leangsuksun, Chokchai Box
    Taerat, Narate
    Nassar, Raja
    Scott, Stephen L.
    2007 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING, 2007, : 312 - +
  • [37] Interconnect lifetime prediction for reliability-aware systems
    Lu, Zhijian
    Huang, Wei
    Stan, Mircea R.
    Skadron, Kevin
    Lach, John
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2007, 15 (02) : 159 - 172
  • [38] Reliability-Aware Dynamic Voltage and Frequency Scaling
    Firouzi, F.
    Salehi, M. E.
    Wang, F.
    Fakhraie, S. M.
    Safari, S.
    IEEE ANNUAL SYMPOSIUM ON VLSI (ISVLSI 2010), 2010, : 304 - 309
  • [39] On Reliability-Aware Server Consolidation in Cloud Datacenters
    Varasteh, Amir
    Tashtarian, Farzad
    Goudarzi, Maziar
    2017 16TH INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED COMPUTING (ISPDC-2017), 2017, : 95 - 101
  • [40] A Case for Lifetime Reliability-Aware Neuromorphic Computing
    Song, Shihao
    Das, Anup
    2020 IEEE 63RD INTERNATIONAL MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS (MWSCAS), 2020, : 596 - 598