DNN Placement and Inference in Edge Computing

被引：0

作者：

Bensalem, Mounir ^{[1
]}

Dizdarevic, Jasenka ^{[1
]}

Jukan, Admela ^{[1
]}

机构：

[1] Tech Univ Carolo Wilhelmina Braunschweig, Braunschweig, Germany

来源：

2020 43RD INTERNATIONAL CONVENTION ON INFORMATION, COMMUNICATION AND ELECTRONIC TECHNOLOGY (MIPRO 2020) | 2020年

关键词：

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

The deployment of deep neural network (DNN) models in software applications is increasing rapidly with the exponential growth of artificial intelligence. Currently, such models are deployed manually by developers in the cloud considering several user requirements, while the decision of model selection and user assignment is difficult to take. With the rise of edge computing paradigm, companies tend to deploy applications as close as possible to the user. Considering this system, the problem of DNN model selection and the inference serving becomes harder due to the introduction of communication latency between nodes. We present an automatic method for DNN placement and inference in edge computing; a mathematical formulation to the DNN Model Variant Selection and Placement (MVSP) problem is presented, it considers the inference latency of different model-variants, communication latency between nodes, and utilization cost of edge computing nodes. Furthermore, we propose a general heuristic algorithm to solve the MVSP problem. We provide an analysis of the effects of hardware sharing on inference latency, on an example of GPU edge computing nodes shared between different DNN model-variants. We evaluate our model numerically, and show the potentials of GPU sharing, with decreased average latency by 33% of millisecond-scale per request for low load, and by 21% for high load. We study the tradeoff between latency and cost and show the pareto optimal curves. Finally, we compare the optimal solution with the proposed heuristic and showed that the average latency per request increased by more than 60%. This can be improved using more efficient placement algorithms.

引用

页码：479 / 484

页数：6

共 50 条

[1] Modeling of Deep Neural Network (DNN) Placement and Inference in Edge Computing
Bensalem, Mounir
Dizdarevic, Jasenka
Jukan, Admela
2020 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS WORKSHOPS (ICC WORKSHOPS), 2020,
[2] Joint Optimization of Device Placement and Model Partitioning for Cooperative DNN Inference in Heterogeneous Edge Computing
Dai, Penglin
Han, Biao
Li, Ke
Xu, Xincao
Xing, Huanlai
Liu, Kai
IEEE TRANSACTIONS ON MOBILE COMPUTING, 2025, 24 (01) : 210 - 226
[3] Elastic DNN Inference With Unpredictable Exit in Edge Computing
Huang, Jiaming
Gao, Yi
Dong, Wei
IEEE TRANSACTIONS ON MOBILE COMPUTING, 2024, 23 (12) : 14005 - 14016
[4] Elastic DNN Inference with Unpredictable Exit in Edge Computing
Huang, Jiaming
Gao, Yi
Dong, Wei
2023 IEEE 43RD INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS, ICDCS, 2023, : 293 - 304
[5] Accelerating DNN Inference With Reliability Guarantee in Vehicular Edge Computing
Liu, Kai
Liu, Chunhui
Yan, Guozhi
Lee, Victor C. S.
Cao, Jiannong
IEEE-ACM TRANSACTIONS ON NETWORKING, 2023, 31 (06) : 3238 - 3253
[6] DNN Inference Acceleration with Partitioning and Early Exiting in Edge Computing
Li, Chao
Xu, Hongli
Xu, Yang
Wang, Zhiyuan
Huang, Liusheng
WIRELESS ALGORITHMS, SYSTEMS, AND APPLICATIONS, WASA 2021, PT I, 2021, 12937 : 465 - 478
[7] Efficient Online DNN Inference with Continuous Learning in Edge Computing
Zeng, Yifan
Zhou, Ruiting
Jia, Lei
Han, Ziyi
Yu, Jieling
Ma, Yue
2024 IEEE/ACM 32ND INTERNATIONAL SYMPOSIUM ON QUALITY OF SERVICE, IWQOS, 2024,
[8] Throughput Maximization of Delay-Aware DNN Inference in Edge Computing by Exploring DNN Model Partitioning and Inference Parallelism
Li, Jing
Liang, Weifa
Li, Yuchen
Xu, Zichuan
Jia, Xiaohua
Guo, Song
IEEE TRANSACTIONS ON MOBILE COMPUTING, 2023, 22 (05) : 3017 - 3030
[9] Online Optimization of DNN Inference Network Utility in Collaborative Edge Computing
Li, Rui
Ouyang, Tao
Zeng, Liekang
Liao, Guocheng
Zhou, Zhi
Chen, Xu
IEEE-ACM TRANSACTIONS ON NETWORKING, 2024, 32 (05) : 4414 - 4426
[10] ADDA: Adaptive Distributed DNN Inference Acceleration in Edge Computing Environment
Wang, Huitian
Cai, Guangxing
Huang, Zhaowu
Dong, Fang
2019 IEEE 25TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2019, : 438 - 445

← 1 2 3 4 5 →