Efficient Compute-Intensive Job Allocation in Data Centers via Deep Reinforcement Learning

被引:30
|
作者
Yi, Deliang [1 ]
Zhou, Xin [1 ]
Wen, Yonggang [1 ]
Tan, Rui [1 ]
机构
[1] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore 639798, Singapore
关键词
Job allocation; data center; energy efficiency; deep reinforcement learning; ENERGY-EFFICIENT;
D O I
10.1109/TPDS.2020.2968427
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Reducing the energy consumption of the servers in a data center via proper job allocation is desirable. Existing advanced job allocation algorithms, based on constrained optimization formulations capturing servers' complex power consumption and thermal dynamics, often scale poorly with the data center size and optimization horizon. This article applies deep reinforcement learning to build an allocation algorithm for long-lasting and compute-intensive jobs that are increasingly seen among today's computation demands. Specifically, a deep Q-network is trained to allocate jobs, aiming to maximize a cumulative reward over long horizons. The training is performed offline using a computational model based on long short-term memory networks that capture the servers' power and thermal dynamics. This offline training approach avoids slow online convergence, low energy efficiency, and potential server overheating during the agent's extensive state-action space exploration if it directly interacts with the physical data center in the usually adopted online learning scheme. At run time, the trained Q-network is forward-propagated with little computation to allocate jobs. Evaluation based on eight months' physical state and job arrival records from a national supercomputing data center hosting 1,152 processors shows that our solution reduces computing power consumption by more than 10 percent and processor temperature by more than 4 degrees C without sacrificing job processing throughput.
引用
收藏
页码:1474 / 1485
页数:12
相关论文
共 50 条
  • [1] Toward Efficient Compute-Intensive Job Allocation for Green Data Centers: A Deep Reinforcement Learning Approach
    Yi, Deliang
    Zhou, Xin
    Wen, Yonggang
    Tan, Rui
    [J]. 2019 39TH IEEE INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2019), 2019, : 634 - 644
  • [2] A Compute-intensive Service Migration Strategy Based on Deep Reinforcement Learning Algorithm
    Cheng, Yongtao
    Li, Xuejing
    [J]. PROCEEDINGS OF 2020 IEEE 4TH INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC 2020), 2020, : 1385 - 1388
  • [3] Data Centers Job Scheduling with Deep Reinforcement Learning
    Liang, Sisheng
    Yang, Zhou
    Jin, Fang
    Chen, Yong
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2020, PT II, 2020, 12085 : 906 - 917
  • [4] Network-aware compute and memory allocation in optically composable data centers with deep reinforcement learning and graph neural networks
    Shabka, Zacharaya
    Zervas, Georgios
    [J]. JOURNAL OF OPTICAL COMMUNICATIONS AND NETWORKING, 2023, 15 (02) : 133 - 143
  • [5] Power-efficient Computing for Compute-intensive GPGPU Applications
    Gilani, Syed Zohaib
    Kim, Nam Sung
    Schulte, Michael
    [J]. PROCEEDINGS OF THE 21ST INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT'12), 2012, : 445 - 446
  • [6] Power-efficient Computing for Compute-intensive GPGPU Applications
    Gilani, Syed Zohaib
    Kim, Nam Sung
    Schulte, Michael J.
    [J]. 19TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA2013), 2013, : 330 - 341
  • [7] Energy Efficient Task Offloading for Compute-intensive Mobile Edge Applications
    Zhang, Xiaojie
    Debroy, Saptarshi
    [J]. ICC 2020 - 2020 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2020,
  • [8] Multi-Dimensional Resource Allocation in Distributed Data Centers Using Deep Reinforcement Learning
    Wei, Wenting
    Gu, Huaxi
    Wang, Kun
    Li, Jianjia
    Zhang, Xuan
    Wang, Ning
    [J]. IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2023, 20 (02): : 1817 - 1829
  • [9] Reinforcement learning based methodology for energy-efficient resource allocation in cloud data centers
    Thein, Thandar
    Myo, Myint Myat
    Parvin, Sazia
    Gawanmeh, Amjad
    [J]. JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2020, 32 (10) : 1127 - 1139
  • [10] Fast Reinforcement Learning Algorithms for Resource Allocation in Data Centers
    Jiang, Yuang
    Kodialam, Murali
    Lakshman, T., V
    Mukherjee, Sarit
    Tassiulas, Leandros
    [J]. 2020 IFIP NETWORKING CONFERENCE AND WORKSHOPS (NETWORKING), 2020, : 271 - 279