Joint DNN Partition Deployment and Resource Allocation for Delay-Sensitive Deep Learning Inference in IoT

被引：74

作者：

He, Wenchen ^{[1
]}

Guo, Shaoyong ^{[1
]}

Guo, Song ^{[2
,3
]}

Qiu, Xuesong ^{[1
]}

Qi, Feng ^{[1
,4
]}

机构：

[1] Beijing Univ Posts & Telecommun, State Key Lab Networking & Switching Technol, Beijing 100876, Peoples R China

[2] Hong Kong Polytech Univ, Dept Comp, Hong Kong, Peoples R China

[3] Hong Kong Polytech Univ, Res Inst Sustainable Urban Dev, Hong Kong, Peoples R China

[4] Cyberspace Secur Res Ctr, Peng Cheng Lab, Shenzhen 518066, Peoples R China

来源：

IEEE INTERNET OF THINGS JOURNAL | 2020年 / 7卷 / 10期

基金：

中国国家自然科学基金;

关键词：

Delays; Task analysis; Resource management; Internet of Things; Computational modeling; Partitioning algorithms; Approximation algorithms; Deep learning (DL); delay sensitive; inference; Internet of Things (IoT); mobile-edge computing (MEC); partition deployment; resource allocation; EDGE; SERVICE; CLOUD; INTELLIGENCE; INTERNET; DISCOVERY; MIGRATION; QUALITY;

D O I：

10.1109/JIOT.2020.2981338

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Nowadays, the widely used Internet-of-Things (IoT) mobile devices (MDs) generate huge volumes of data, which need analyzing and extracting accurate information in real time by compute-intensive deep learning (DL) inference tasks. Due to its multilayer structure, the deep neural network (DNN) is appropriate for the mobile-edge computing (MEC) environment, and the DL tasks can be offloaded to DNN partitions deployed in MEC servers (MECSs) for speed-up inference. In this article, we first assume the arrival process of DL tasks as Poisson distribution and develop a tandem queueing model to evaluate the end-to-end (E2E) inference delay of DL tasks in multiple DNN partitions. To minimize the E2E delay, we develop a joint optimization problem model of partition deployment and resource allocation in MECSs (JPDRA). Since the JPDRA is a mixed-integer nonlinear programming (MINLP) problem, we decompose the original problem into a computing resource allocation (CRA) problem with fixed partition deployment decision and a DNN partition deployment (DPD) problem that optimizes the optimal-delay function related to the CRA problem. Next, we design a CRA algorithm based on Markov approximation and a low-complexity DPD algorithm to obtain the near-optimal solution in the polynomial time. The simulation results demonstrate that the proposed algorithms are more efficient and can reduce the average E2E delay by 25.7% with better convergence performance.

引用

页码：9241 / 9254

页数：14

共 50 条

[41] Deep Reinforcement Learning Based Dynamic Routing Optimization for Delay-Sensitive Applications
Chen, Jiawei
Xiao, Yang
Lin, Guocheng
He, Gang
Liu, Fang
Zhou, Wenli
Liu, Jun
[J]. IEEE CONFERENCE ON GLOBAL COMMUNICATIONS, GLOBECOM, 2023, : 5208 - 5213
[42] Delay-Sensitive Energy-Efficient UAV Crowdsensing by Deep Reinforcement Learning
Dai, Zipeng
Liu, Chi Harold
Han, Rui
Wang, Guoren
Leung, Kin K. K.
Tang, Jian
[J]. IEEE TRANSACTIONS ON MOBILE COMPUTING, 2023, 22 (04) : 2038 - 2052
[43] On Delay-Sensitive Healthcare Data Analytics at the Network Edge Based on Deep Learning
Fadlullah, Zubair Md.
Pathan, Al-Sakib Khan
Gacanin, Haris
[J]. 2018 14TH INTERNATIONAL WIRELESS COMMUNICATIONS & MOBILE COMPUTING CONFERENCE (IWCMC), 2018, : 388 - 393
[44] Learning-Based Memory Allocation Optimization for Delay-Sensitive Big Data Processing
Tsai, Linjiun
Franke, Hubertus
Li, Chung-Sheng
Liao, Wanjiun
[J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2018, 29 (06) : 1332 - 1341
[45] Joint resource allocation for emotional 5G IoT systems using deep reinforcement learning
Yang, Ziyan
Mei, Haibo
Wang, Wenyong
Zhou, Dongdai
Yang, Kun
[J]. INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2021, 12 (12) : 3517 - 3528
[46] Joint resource allocation for emotional 5G IoT systems using deep reinforcement learning
Ziyan Yang
Haibo Mei
Wenyong Wang
Dongdai Zhou
Kun Yang
[J]. International Journal of Machine Learning and Cybernetics, 2021, 12 : 3517 - 3528
[47] Buffer-Aware and Delay-Sensitive Resource Allocation in the Uplink of 3GPP LTE Networks
Wang, Chiapin
Huang, Jeng-Ji
Su, Chung-Yen
[J]. WIRELESS PERSONAL COMMUNICATIONS, 2015, 84 (03) : 1877 - 1890
[48] Buffer-Aware and Delay-Sensitive Resource Allocation in the Uplink of 3GPP LTE Networks
Chiapin Wang
Jeng-Ji Huang
Chung-Yen Su
[J]. Wireless Personal Communications, 2015, 84 : 1877 - 1890
[49] Deadline-Aware Multicast Resource Allocation in SDM-EONs With Fluctuating Delay-Sensitive Traffic
Samuel, Aretor
Zhang, Yudong
Zhu, Ruijie
[J]. JOURNAL OF LIGHTWAVE TECHNOLOGY, 2022, 40 (16) : 5355 - 5368
[50] Accuracy-Guaranteed Collaborative DNN Inference in Industrial IoT via Deep Reinforcement Learning
Wu, Wen
Yang, Peng
Zhang, Weiting
Zhou, Conghao
Shen, Xuemin
[J]. IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2021, 17 (07) : 4988 - 4998

← 1 2 3 4 5 →