Cutting-Edge Inference: Dynamic DNN Model Partitioning and Resource Scaling for Mobile AI

被引:0
|
作者
Lim, Jeong-A [1 ]
Lee, Joohyun [2 ]
Kwak, Jeongho [3 ]
Kim, Yeongjin [1 ]
机构
[1] Inha University, Department of Electronic Engineering, Incheon,22212, Korea, Republic of
[2] Hanyang University, Department of Electrical and Electronic Engineering, Ansan,15588, Korea, Republic of
[3] Daegu Gyeongbuk Institute of Science and Technology (DGIST), Daegu,42988, Korea, Republic of
来源
基金
新加坡国家研究基金会;
关键词
Augmented reality - Deep neural networks - Mobile edge computing - Stochastic control systems - Stochastic models;
D O I
10.1109/TSC.2024.3466848
中图分类号
学科分类号
摘要
Recently, applications using artificial intelligence (AI) technique in mobile devices such as augmented reality have been extensively pervasive. The hardware specifications of mobile devices, dynamic service demands, stochastic network states, and characteristics of DNN (Deep Neural Network) models affect the quality of experience (QoE) of such applications. In this paper, we propose CutEdge, that leverages a virtual queue-based Lyapunov optimization framework to jointly optimize DNN model partitioning between a mobile device and a mobile edge computing (MEC) server and processing/networking resources in a mobile device with respect to internal/external system dynamics. Specifically, CutEdge makes decisions of (i) the partition point of DNN model between the mobile device and MEC server, (ii) GPU clock frequency, and (iii) transmission rates in a mobile device, simultaneously. Then, we theoretically show the optimal trade-off curves among energy consumption, throughput, and end-to-end latency yielded by CutEdge where such QoE metrics have not been jointly addressed in the previous studies. Moreover, we show the impact of joint optimization of three control parameters on the performances via real trace-driven simulations. Finally, we show the superiority of CutEdge over the existing algorithms by experiment on top of implemented testbed using an embedded AI device and an MEC server. © 2008-2012 IEEE.
引用
收藏
页码:3300 / 3316
相关论文
共 50 条
  • [1] Distributed DNN Inference With Fine-Grained Model Partitioning in Mobile Edge Computing Networks
    Li, Hui
    Li, Xiuhua
    Fan, Qilin
    He, Qiang
    Wang, Xiaofei
    Leung, Victor C. M.
    IEEE TRANSACTIONS ON MOBILE COMPUTING, 2024, 23 (10) : 9060 - 9074
  • [2] Joint Optimization With DNN Partitioning and Resource Allocation in Mobile Edge Computing
    Dong, Chongwu
    Hu, Sheng
    Chen, Xi
    Wen, Wushao
    IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2021, 18 (04): : 3973 - 3986
  • [3] DNN Partitioning for Inference Throughput Acceleration at the Edge
    Feltin, Thomas
    Marcho, Leo
    Cordero-Fuertes, Juan-Antonio
    Brockners, Frank
    Clausen, Thomas H.
    IEEE ACCESS, 2023, 11 : 52236 - 52249
  • [4] DNN Surgery: Accelerating DNN Inference on the Edge Through Layer Partitioning
    Liang, Huanghuang
    Sang, Qianlong
    Hu, Chuang
    Cheng, Dazhao
    Zhou, Xiaobo
    Wang, Dan
    Bao, Wei
    Wang, Yu
    IEEE TRANSACTIONS ON CLOUD COMPUTING, 2023, 11 (03) : 3111 - 3125
  • [5] Collaborative Inference Acceleration Integrating DNN Partitioning and Task Offloading in Mobile Edge Computing
    Xu, Wenxiu
    Yin, Yin
    Chen, Ningjiang
    Tu, Huan
    INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2023, 33 (11N12) : 1835 - 1863
  • [6] Joint DNN partitioning and resource allocation for completion rate maximization of delay-aware DNN inference tasks in wireless powered mobile edge computing
    Xianzhong Tian
    Pengcheng Xu
    Yifan Shen
    Yuheng Shao
    Peer-to-Peer Networking and Applications, 2023, 16 (6) : 2865 - 2878
  • [7] Joint DNN partitioning and resource allocation for completion rate maximization of delay-aware DNN inference tasks in wireless powered mobile edge computing
    Tian, Xianzhong
    Xu, Pengcheng
    Shen, Yifan
    Shao, Yuheng
    PEER-TO-PEER NETWORKING AND APPLICATIONS, 2023, 16 (06) : 2865 - 2878
  • [8] Throughput Maximization of Delay-Aware DNN Inference in Edge Computing by Exploring DNN Model Partitioning and Inference Parallelism
    Li, Jing
    Liang, Weifa
    Li, Yuchen
    Xu, Zichuan
    Jia, Xiaohua
    Guo, Song
    IEEE TRANSACTIONS ON MOBILE COMPUTING, 2023, 22 (05) : 3017 - 3030
  • [9] Joint Model, Task Partitioning and Privacy Preserving Adaptation for Edge DNN Inference
    Jiang, Jingran
    Li, Hongjia
    Wang, Liming
    2022 IEEE WIRELESS COMMUNICATIONS AND NETWORKING CONFERENCE (WCNC), 2022, : 1224 - 1229
  • [10] DNN Inference Acceleration with Partitioning and Early Exiting in Edge Computing
    Li, Chao
    Xu, Hongli
    Xu, Yang
    Wang, Zhiyuan
    Huang, Liusheng
    WIRELESS ALGORITHMS, SYSTEMS, AND APPLICATIONS, WASA 2021, PT I, 2021, 12937 : 465 - 478