Distributed Training for Deep Learning Models On An Edge Computing Network Using Shielded Reinforcement Learning

被引:0
|
作者
Sen, Tanmoy [1 ]
Shen, Haiying [1 ]
机构
[1] Univ Virginia, Dept Comp Sci, Charlottesville, VA 22903 USA
关键词
IOT;
D O I
10.1109/ICDCS54860.2022.00062
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
With the emergence of edge devices along with their local computation advantage over the cloud, distributed deep learning (DL) training on edge nodes becomes promising. In such a method, the cluster head of a cluster of edge nodes schedules all the DL training jobs from the cluster nodes. Using such a centralized scheduling method, the cluster head knows all the loads of the cluster nodes, which can avoid overloading the cluster nodes, but the head itself may become overloaded. To handle this problem, we first propose a multi-agent RL (MARL) system that enables each edge node to schedule its own jobs using RL. However, without the coordination between the nodes, action collision may occur, in which multiple nodes may schedule tasks to the same node and make it overloaded. To avoid these problems, we propose a system called Shielded ReinfOrcement learning (RL) based DL training on Edges (SROLE). In SROLE, each edge node schedules its own jobs using multi-agent RL. The shield deployed in a node checks action collisions and provides alternative actions to avoid the collisions. As the central shield node for the entire cluster may become a bottleneck, we further propose a decentralized shielding method, in which different shields are responsible for different regions in the cluster and they coordinate to avoid action collisions on the region boundaries. Our container-based emulation experiments show that SROLE reduces training time by up to 59% with 29% lower median resource utilization and reduces the number of action collisions by up to 48% compared to multi-agent RL and the centralized RL. Our real device experiments show that SROLE still reduces the training time by up to 53% with 28% lower median resource utilization than multi-agent RL and the centralized RL.
引用
收藏
页码:581 / 591
页数:11
相关论文
共 50 条
  • [1] Distributed Edge Computing Offloading Algorithm Based on Deep Reinforcement Learning
    Li, Yunzhao
    Qi, Feng
    Wang, Zhili
    Yu, Xiuming
    Shao, Sujie
    [J]. IEEE ACCESS, 2020, 8 : 85204 - 85215
  • [2] Deep Reinforcement Learning for IoT Network Dynamic Clustering in Edge Computing
    Liu, Qingzhi
    Cheng, Long
    Ozcelebi, Tanir
    Murphy, John
    Lukkien, Johan
    [J]. 2019 19TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2019, : 600 - 603
  • [3] Deep Reinforcement Learning for Offloading and Shunting in Hybrid Edge Computing Network
    Zhang, Jiadong
    Shi, Wenxiao
    Zhang, Ruidong
    Liu, Sicheng
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS WORKSHOPS (ICC WORKSHOPS), 2021,
  • [4] A Distributed Computation Offloading Strategy for Edge Computing Based on Deep Reinforcement Learning
    Lai, Hongyang
    Yang, Zhuocheng
    Li, Jinhao
    Wu, Celimuge
    Bao, Wugedele
    [J]. MOBILE NETWORKS AND MANAGEMENT, MONAMI 2021, 2022, 418 : 73 - 86
  • [5] Distributed Deep Learning in An Edge Computing System
    Sen, Tanmoy
    Shen, Haiying
    Mehrab, Zakaria
    [J]. 2022 IEEE 19TH INTERNATIONAL CONFERENCE ON MOBILE AD HOC AND SMART SYSTEMS (MASS 2022), 2022, : 645 - 653
  • [6] Task migration for mobile edge computing using deep reinforcement learning
    Zhang, Cheng
    Zheng, Zixuan
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2019, 96 : 111 - 118
  • [7] Binary Computation Offloading in Edge Computing Using Deep Reinforcement Learning
    Rajwar, Dipankar
    Kumar, Dinesh
    [J]. ADVANCED NETWORK TECHNOLOGIES AND INTELLIGENT COMPUTING, ANTIC 2023, PT II, 2024, 2091 : 215 - 227
  • [8] EdgeSlice: Slicing Wireless Edge Computing Network with Decentralized Deep Reinforcement Learning
    Liu, Qiang
    Han, Tao
    Moges, Ephraim
    [J]. 2020 IEEE 40TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS), 2020, : 234 - 244
  • [9] A Distributed Deep Reinforcement Learning Technique for Application Placement in Edge and Fog Computing Environments
    Goudarzi, Mohammad
    Palaniswami, Marimuthu
    Buyya, Rajkumar
    [J]. IEEE TRANSACTIONS ON MOBILE COMPUTING, 2023, 22 (05) : 2491 - 2505
  • [10] Deep Reinforcement Learning based VNF Management in Geo-distributed Edge Computing
    Gu, Lin
    Zeng, Deze
    Li, Wei
    Guo, Song
    Zomaya, Albert Y.
    Jin, Hai
    [J]. 2019 39TH IEEE INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2019), 2019, : 934 - 943