Joint DNN Partition Deployment and Resource Allocation for Delay-Sensitive Deep Learning Inference in IoT

被引:74
|
作者
He, Wenchen [1 ]
Guo, Shaoyong [1 ]
Guo, Song [2 ,3 ]
Qiu, Xuesong [1 ]
Qi, Feng [1 ,4 ]
机构
[1] Beijing Univ Posts & Telecommun, State Key Lab Networking & Switching Technol, Beijing 100876, Peoples R China
[2] Hong Kong Polytech Univ, Dept Comp, Hong Kong, Peoples R China
[3] Hong Kong Polytech Univ, Res Inst Sustainable Urban Dev, Hong Kong, Peoples R China
[4] Cyberspace Secur Res Ctr, Peng Cheng Lab, Shenzhen 518066, Peoples R China
来源
IEEE INTERNET OF THINGS JOURNAL | 2020年 / 7卷 / 10期
基金
中国国家自然科学基金;
关键词
Delays; Task analysis; Resource management; Internet of Things; Computational modeling; Partitioning algorithms; Approximation algorithms; Deep learning (DL); delay sensitive; inference; Internet of Things (IoT); mobile-edge computing (MEC); partition deployment; resource allocation; EDGE; SERVICE; CLOUD; INTELLIGENCE; INTERNET; DISCOVERY; MIGRATION; QUALITY;
D O I
10.1109/JIOT.2020.2981338
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Nowadays, the widely used Internet-of-Things (IoT) mobile devices (MDs) generate huge volumes of data, which need analyzing and extracting accurate information in real time by compute-intensive deep learning (DL) inference tasks. Due to its multilayer structure, the deep neural network (DNN) is appropriate for the mobile-edge computing (MEC) environment, and the DL tasks can be offloaded to DNN partitions deployed in MEC servers (MECSs) for speed-up inference. In this article, we first assume the arrival process of DL tasks as Poisson distribution and develop a tandem queueing model to evaluate the end-to-end (E2E) inference delay of DL tasks in multiple DNN partitions. To minimize the E2E delay, we develop a joint optimization problem model of partition deployment and resource allocation in MECSs (JPDRA). Since the JPDRA is a mixed-integer nonlinear programming (MINLP) problem, we decompose the original problem into a computing resource allocation (CRA) problem with fixed partition deployment decision and a DNN partition deployment (DPD) problem that optimizes the optimal-delay function related to the CRA problem. Next, we design a CRA algorithm based on Markov approximation and a low-complexity DPD algorithm to obtain the near-optimal solution in the polynomial time. The simulation results demonstrate that the proposed algorithms are more efficient and can reduce the average E2E delay by 25.7% with better convergence performance.
引用
收藏
页码:9241 / 9254
页数:14
相关论文
共 50 条
  • [1] DNN Deployment, Task Offloading, and Resource Allocation for Joint Task Inference in IIoT
    Fan, Wenhao
    Chen, Zeyu
    Hao, Zhibo
    Su, Yi
    Wu, Fan
    Tang, Bihua
    Liu, Yuan'an
    [J]. IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2023, 19 (02) : 1634 - 1646
  • [2] Joint Task Offloading and Resource Allocation for Delay-sensitive Fog Networks
    Mukherjee, Mithun
    Kumar, Suman
    Shojafar, Mohammad
    Zhang, Qi
    Mavromoustakis, Constandinos X.
    [J]. ICC 2019 - 2019 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2019,
  • [3] Distributed Resource Allocation With Federated Learning for Delay-Sensitive IoV Services
    Song, Xiaoqin
    Hua, Yuqing
    Yang, Yang
    Xing, Guoliang
    Liu, Fang
    Xu, Lei
    Song, Tiecheng
    [J]. IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2024, 73 (03) : 4326 - 4336
  • [4] Deep Reinforcement Learning Based Resource Management for DNN Inference in Industrial IoT
    Zhang, Weiting
    Yang, Dong
    Haixia, Peng
    Wu, Wen
    Quan, Wei
    Zhang, Hongke
    Shen, Xuemin
    [J]. IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2021, 70 (08) : 7605 - 7618
  • [5] Learning-Based Proactive Resource Allocation for Delay-Sensitive Packet Transmission
    Chen, Jiayin
    Yang, Peng
    Ye, Qiang
    Zhuang, Weihua
    Shen, Xuemin
    Li, Xu
    [J]. IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING, 2021, 7 (02) : 675 - 688
  • [6] Joint DNN Partition and Resource Allocation for Task Offloading in Edge-Cloud-Assisted IoT Environments
    Fan, Wenhao
    Gao, Li
    Su, Yi
    Wu, Fan
    Liu, Yuan'an
    [J]. IEEE INTERNET OF THINGS JOURNAL, 2023, 10 (12) : 10146 - 10159
  • [7] Resource Allocation for Network Slices with Delay-Sensitive Multimedia Services
    Gao, Jing
    Zhou, Fanqin
    Sun, Gang
    Feng, Lei
    Li, Wenjing
    Qiu, Xuesong
    Chen, Xiaolu
    [J]. 2020 IEEE INTERNATIONAL SYMPOSIUM ON BROADBAND MULTIMEDIA SYSTEMS AND BROADCASTING (BMSB), 2020,
  • [8] Delay-sensitive resource allocation for IoT systems in 5G O-RAN networks
    Firouzi, Ramin
    Rahmani, Rahim
    [J]. INTERNET OF THINGS, 2024, 26
  • [9] Minimizing the Deployment Cost of UAVs for Delay-Sensitive Data Collection in IoT Networks
    Xu, Wenzheng
    Xiao, Tao
    Zhang, Junqi
    Liang, Weifa
    Xu, Zichuan
    Liu, Xuxun
    Jia, Xiaohua
    Das, Sajal K.
    [J]. IEEE-ACM TRANSACTIONS ON NETWORKING, 2022, 30 (02) : 812 - 825
  • [10] Supporting Delay-Sensitive IoT Applications: A Machine Learning Approach
    Alnoman, Ali
    [J]. 2020 IEEE CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (CCECE), 2020,