Distributed Training for Deep Learning Models On An Edge Computing Network Using Shielded Reinforcement Learning

被引：0

作者：

Sen, Tanmoy ^{[1
]}

Shen, Haiying ^{[1
]}

机构：

[1] Univ Virginia, Dept Comp Sci, Charlottesville, VA 22903 USA

来源：

2022 IEEE 42ND INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2022) | 2022年

关键词：

IOT;

D O I：

10.1109/ICDCS54860.2022.00062

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

With the emergence of edge devices along with their local computation advantage over the cloud, distributed deep learning (DL) training on edge nodes becomes promising. In such a method, the cluster head of a cluster of edge nodes schedules all the DL training jobs from the cluster nodes. Using such a centralized scheduling method, the cluster head knows all the loads of the cluster nodes, which can avoid overloading the cluster nodes, but the head itself may become overloaded. To handle this problem, we first propose a multi-agent RL (MARL) system that enables each edge node to schedule its own jobs using RL. However, without the coordination between the nodes, action collision may occur, in which multiple nodes may schedule tasks to the same node and make it overloaded. To avoid these problems, we propose a system called Shielded ReinfOrcement learning (RL) based DL training on Edges (SROLE). In SROLE, each edge node schedules its own jobs using multi-agent RL. The shield deployed in a node checks action collisions and provides alternative actions to avoid the collisions. As the central shield node for the entire cluster may become a bottleneck, we further propose a decentralized shielding method, in which different shields are responsible for different regions in the cluster and they coordinate to avoid action collisions on the region boundaries. Our container-based emulation experiments show that SROLE reduces training time by up to 59% with 29% lower median resource utilization and reduces the number of action collisions by up to 48% compared to multi-agent RL and the centralized RL. Our real device experiments show that SROLE still reduces the training time by up to 53% with 28% lower median resource utilization than multi-agent RL and the centralized RL.

引用

页码：581 / 591

页数：11

共 50 条

[1] Distributed Edge Computing Offloading Algorithm Based on Deep Reinforcement Learning
Li, Yunzhao
Qi, Feng
Wang, Zhili
Yu, Xiuming
Shao, Sujie
[J]. IEEE ACCESS, 2020, 8 : 85204 - 85215
[2] Deep Reinforcement Learning for IoT Network Dynamic Clustering in Edge Computing
Liu, Qingzhi
Cheng, Long
Ozcelebi, Tanir
Murphy, John
Lukkien, Johan
[J]. 2019 19TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2019, : 600 - 603
[3] Deep Reinforcement Learning for Offloading and Shunting in Hybrid Edge Computing Network
Zhang, Jiadong
Shi, Wenxiao
Zhang, Ruidong
Liu, Sicheng
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS WORKSHOPS (ICC WORKSHOPS), 2021,
[4] A Distributed Computation Offloading Strategy for Edge Computing Based on Deep Reinforcement Learning
Lai, Hongyang
Yang, Zhuocheng
Li, Jinhao
Wu, Celimuge
Bao, Wugedele
[J]. MOBILE NETWORKS AND MANAGEMENT, MONAMI 2021, 2022, 418 : 73 - 86
[5] Distributed Deep Learning in An Edge Computing System
Sen, Tanmoy
Shen, Haiying
Mehrab, Zakaria
[J]. 2022 IEEE 19TH INTERNATIONAL CONFERENCE ON MOBILE AD HOC AND SMART SYSTEMS (MASS 2022), 2022, : 645 - 653
[6] Task migration for mobile edge computing using deep reinforcement learning
Zhang, Cheng
Zheng, Zixuan
[J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2019, 96 : 111 - 118
[7] Binary Computation Offloading in Edge Computing Using Deep Reinforcement Learning
Rajwar, Dipankar
Kumar, Dinesh
[J]. ADVANCED NETWORK TECHNOLOGIES AND INTELLIGENT COMPUTING, ANTIC 2023, PT II, 2024, 2091 : 215 - 227
[8] EdgeSlice: Slicing Wireless Edge Computing Network with Decentralized Deep Reinforcement Learning
Liu, Qiang
Han, Tao
Moges, Ephraim
[J]. 2020 IEEE 40TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS), 2020, : 234 - 244
[9] A Distributed Deep Reinforcement Learning Technique for Application Placement in Edge and Fog Computing Environments
Goudarzi, Mohammad
Palaniswami, Marimuthu
Buyya, Rajkumar
[J]. IEEE TRANSACTIONS ON MOBILE COMPUTING, 2023, 22 (05) : 2491 - 2505
[10] Deep Reinforcement Learning based VNF Management in Geo-distributed Edge Computing
Gu, Lin
Zeng, Deze
Li, Wei
Guo, Song
Zomaya, Albert Y.
Jin, Hai
[J]. 2019 39TH IEEE INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2019), 2019, : 934 - 943

← 1 2 3 4 5 →