Joint DNN Partition Deployment and Resource Allocation for Delay-Sensitive Deep Learning Inference in IoT

被引:74
|
作者
He, Wenchen [1 ]
Guo, Shaoyong [1 ]
Guo, Song [2 ,3 ]
Qiu, Xuesong [1 ]
Qi, Feng [1 ,4 ]
机构
[1] Beijing Univ Posts & Telecommun, State Key Lab Networking & Switching Technol, Beijing 100876, Peoples R China
[2] Hong Kong Polytech Univ, Dept Comp, Hong Kong, Peoples R China
[3] Hong Kong Polytech Univ, Res Inst Sustainable Urban Dev, Hong Kong, Peoples R China
[4] Cyberspace Secur Res Ctr, Peng Cheng Lab, Shenzhen 518066, Peoples R China
来源
IEEE INTERNET OF THINGS JOURNAL | 2020年 / 7卷 / 10期
基金
中国国家自然科学基金;
关键词
Delays; Task analysis; Resource management; Internet of Things; Computational modeling; Partitioning algorithms; Approximation algorithms; Deep learning (DL); delay sensitive; inference; Internet of Things (IoT); mobile-edge computing (MEC); partition deployment; resource allocation; EDGE; SERVICE; CLOUD; INTELLIGENCE; INTERNET; DISCOVERY; MIGRATION; QUALITY;
D O I
10.1109/JIOT.2020.2981338
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Nowadays, the widely used Internet-of-Things (IoT) mobile devices (MDs) generate huge volumes of data, which need analyzing and extracting accurate information in real time by compute-intensive deep learning (DL) inference tasks. Due to its multilayer structure, the deep neural network (DNN) is appropriate for the mobile-edge computing (MEC) environment, and the DL tasks can be offloaded to DNN partitions deployed in MEC servers (MECSs) for speed-up inference. In this article, we first assume the arrival process of DL tasks as Poisson distribution and develop a tandem queueing model to evaluate the end-to-end (E2E) inference delay of DL tasks in multiple DNN partitions. To minimize the E2E delay, we develop a joint optimization problem model of partition deployment and resource allocation in MECSs (JPDRA). Since the JPDRA is a mixed-integer nonlinear programming (MINLP) problem, we decompose the original problem into a computing resource allocation (CRA) problem with fixed partition deployment decision and a DNN partition deployment (DPD) problem that optimizes the optimal-delay function related to the CRA problem. Next, we design a CRA algorithm based on Markov approximation and a low-complexity DPD algorithm to obtain the near-optimal solution in the polynomial time. The simulation results demonstrate that the proposed algorithms are more efficient and can reduce the average E2E delay by 25.7% with better convergence performance.
引用
收藏
页码:9241 / 9254
页数:14
相关论文
共 50 条
  • [41] Deep Reinforcement Learning Based Dynamic Routing Optimization for Delay-Sensitive Applications
    Chen, Jiawei
    Xiao, Yang
    Lin, Guocheng
    He, Gang
    Liu, Fang
    Zhou, Wenli
    Liu, Jun
    [J]. IEEE CONFERENCE ON GLOBAL COMMUNICATIONS, GLOBECOM, 2023, : 5208 - 5213
  • [42] Delay-Sensitive Energy-Efficient UAV Crowdsensing by Deep Reinforcement Learning
    Dai, Zipeng
    Liu, Chi Harold
    Han, Rui
    Wang, Guoren
    Leung, Kin K. K.
    Tang, Jian
    [J]. IEEE TRANSACTIONS ON MOBILE COMPUTING, 2023, 22 (04) : 2038 - 2052
  • [43] On Delay-Sensitive Healthcare Data Analytics at the Network Edge Based on Deep Learning
    Fadlullah, Zubair Md.
    Pathan, Al-Sakib Khan
    Gacanin, Haris
    [J]. 2018 14TH INTERNATIONAL WIRELESS COMMUNICATIONS & MOBILE COMPUTING CONFERENCE (IWCMC), 2018, : 388 - 393
  • [44] Learning-Based Memory Allocation Optimization for Delay-Sensitive Big Data Processing
    Tsai, Linjiun
    Franke, Hubertus
    Li, Chung-Sheng
    Liao, Wanjiun
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2018, 29 (06) : 1332 - 1341
  • [45] Joint resource allocation for emotional 5G IoT systems using deep reinforcement learning
    Yang, Ziyan
    Mei, Haibo
    Wang, Wenyong
    Zhou, Dongdai
    Yang, Kun
    [J]. INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2021, 12 (12) : 3517 - 3528
  • [46] Joint resource allocation for emotional 5G IoT systems using deep reinforcement learning
    Ziyan Yang
    Haibo Mei
    Wenyong Wang
    Dongdai Zhou
    Kun Yang
    [J]. International Journal of Machine Learning and Cybernetics, 2021, 12 : 3517 - 3528
  • [47] Buffer-Aware and Delay-Sensitive Resource Allocation in the Uplink of 3GPP LTE Networks
    Wang, Chiapin
    Huang, Jeng-Ji
    Su, Chung-Yen
    [J]. WIRELESS PERSONAL COMMUNICATIONS, 2015, 84 (03) : 1877 - 1890
  • [48] Buffer-Aware and Delay-Sensitive Resource Allocation in the Uplink of 3GPP LTE Networks
    Chiapin Wang
    Jeng-Ji Huang
    Chung-Yen Su
    [J]. Wireless Personal Communications, 2015, 84 : 1877 - 1890
  • [49] Deadline-Aware Multicast Resource Allocation in SDM-EONs With Fluctuating Delay-Sensitive Traffic
    Samuel, Aretor
    Zhang, Yudong
    Zhu, Ruijie
    [J]. JOURNAL OF LIGHTWAVE TECHNOLOGY, 2022, 40 (16) : 5355 - 5368
  • [50] Accuracy-Guaranteed Collaborative DNN Inference in Industrial IoT via Deep Reinforcement Learning
    Wu, Wen
    Yang, Peng
    Zhang, Weiting
    Zhou, Conghao
    Shen, Xuemin
    [J]. IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2021, 17 (07) : 4988 - 4998