Cutting-Edge Inference: Dynamic DNN Model Partitioning and Resource Scaling for Mobile AI

被引：0

作者：

Lim, Jeong-A ^{[1
]}

Lee, Joohyun ^{[2
]}

Kwak, Jeongho ^{[3
]}

Kim, Yeongjin ^{[1
]}

机构：

[1] Inha University, Department of Electronic Engineering, Incheon,22212, Korea, Republic of

[2] Hanyang University, Department of Electrical and Electronic Engineering, Ansan,15588, Korea, Republic of

[3] Daegu Gyeongbuk Institute of Science and Technology (DGIST), Daegu,42988, Korea, Republic of

来源：

IEEE Transactions on Services Computing | 2024年 / 17卷 / 06期

基金：

新加坡国家研究基金会;

关键词：

Augmented reality - Deep neural networks - Mobile edge computing - Stochastic control systems - Stochastic models;

D O I：

10.1109/TSC.2024.3466848

中图分类号：

学科分类号：

摘要：

Recently, applications using artificial intelligence (AI) technique in mobile devices such as augmented reality have been extensively pervasive. The hardware specifications of mobile devices, dynamic service demands, stochastic network states, and characteristics of DNN (Deep Neural Network) models affect the quality of experience (QoE) of such applications. In this paper, we propose CutEdge, that leverages a virtual queue-based Lyapunov optimization framework to jointly optimize DNN model partitioning between a mobile device and a mobile edge computing (MEC) server and processing/networking resources in a mobile device with respect to internal/external system dynamics. Specifically, CutEdge makes decisions of (i) the partition point of DNN model between the mobile device and MEC server, (ii) GPU clock frequency, and (iii) transmission rates in a mobile device, simultaneously. Then, we theoretically show the optimal trade-off curves among energy consumption, throughput, and end-to-end latency yielded by CutEdge where such QoE metrics have not been jointly addressed in the previous studies. Moreover, we show the impact of joint optimization of three control parameters on the performances via real trace-driven simulations. Finally, we show the superiority of CutEdge over the existing algorithms by experiment on top of implemented testbed using an embedded AI device and an MEC server. © 2008-2012 IEEE.

引用

页码：3300 / 3316

共 50 条

[1] Distributed DNN Inference With Fine-Grained Model Partitioning in Mobile Edge Computing Networks
Li, Hui
Li, Xiuhua
Fan, Qilin
He, Qiang
Wang, Xiaofei
Leung, Victor C. M.
IEEE TRANSACTIONS ON MOBILE COMPUTING, 2024, 23 (10) : 9060 - 9074
[2] Joint Optimization With DNN Partitioning and Resource Allocation in Mobile Edge Computing
Dong, Chongwu
Hu, Sheng
Chen, Xi
Wen, Wushao
IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2021, 18 (04): : 3973 - 3986
[3] DNN Partitioning for Inference Throughput Acceleration at the Edge
Feltin, Thomas
Marcho, Leo
Cordero-Fuertes, Juan-Antonio
Brockners, Frank
Clausen, Thomas H.
IEEE ACCESS, 2023, 11 : 52236 - 52249
[4] DNN Surgery: Accelerating DNN Inference on the Edge Through Layer Partitioning
Liang, Huanghuang
Sang, Qianlong
Hu, Chuang
Cheng, Dazhao
Zhou, Xiaobo
Wang, Dan
Bao, Wei
Wang, Yu
IEEE TRANSACTIONS ON CLOUD COMPUTING, 2023, 11 (03) : 3111 - 3125
[5] Collaborative Inference Acceleration Integrating DNN Partitioning and Task Offloading in Mobile Edge Computing
Xu, Wenxiu
Yin, Yin
Chen, Ningjiang
Tu, Huan
INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2023, 33 (11N12) : 1835 - 1863
[6] Joint DNN partitioning and resource allocation for completion rate maximization of delay-aware DNN inference tasks in wireless powered mobile edge computing
Xianzhong Tian
Pengcheng Xu
Yifan Shen
Yuheng Shao
Peer-to-Peer Networking and Applications, 2023, 16 (6) : 2865 - 2878
[7] Joint DNN partitioning and resource allocation for completion rate maximization of delay-aware DNN inference tasks in wireless powered mobile edge computing
Tian, Xianzhong
Xu, Pengcheng
Shen, Yifan
Shao, Yuheng
PEER-TO-PEER NETWORKING AND APPLICATIONS, 2023, 16 (06) : 2865 - 2878
[8] Throughput Maximization of Delay-Aware DNN Inference in Edge Computing by Exploring DNN Model Partitioning and Inference Parallelism
Li, Jing
Liang, Weifa
Li, Yuchen
Xu, Zichuan
Jia, Xiaohua
Guo, Song
IEEE TRANSACTIONS ON MOBILE COMPUTING, 2023, 22 (05) : 3017 - 3030
[9] Joint Model, Task Partitioning and Privacy Preserving Adaptation for Edge DNN Inference
Jiang, Jingran
Li, Hongjia
Wang, Liming
2022 IEEE WIRELESS COMMUNICATIONS AND NETWORKING CONFERENCE (WCNC), 2022, : 1224 - 1229
[10] DNN Inference Acceleration with Partitioning and Early Exiting in Edge Computing
Li, Chao
Xu, Hongli
Xu, Yang
Wang, Zhiyuan
Huang, Liusheng
WIRELESS ALGORITHMS, SYSTEMS, AND APPLICATIONS, WASA 2021, PT I, 2021, 12937 : 465 - 478

← 1 2 3 4 5 →