DNN Placement and Inference in Edge Computing

被引:0
|
作者
Bensalem, Mounir [1 ]
Dizdarevic, Jasenka [1 ]
Jukan, Admela [1 ]
机构
[1] Tech Univ Carolo Wilhelmina Braunschweig, Braunschweig, Germany
关键词
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The deployment of deep neural network (DNN) models in software applications is increasing rapidly with the exponential growth of artificial intelligence. Currently, such models are deployed manually by developers in the cloud considering several user requirements, while the decision of model selection and user assignment is difficult to take. With the rise of edge computing paradigm, companies tend to deploy applications as close as possible to the user. Considering this system, the problem of DNN model selection and the inference serving becomes harder due to the introduction of communication latency between nodes. We present an automatic method for DNN placement and inference in edge computing; a mathematical formulation to the DNN Model Variant Selection and Placement (MVSP) problem is presented, it considers the inference latency of different model-variants, communication latency between nodes, and utilization cost of edge computing nodes. Furthermore, we propose a general heuristic algorithm to solve the MVSP problem. We provide an analysis of the effects of hardware sharing on inference latency, on an example of GPU edge computing nodes shared between different DNN model-variants. We evaluate our model numerically, and show the potentials of GPU sharing, with decreased average latency by 33% of millisecond-scale per request for low load, and by 21% for high load. We study the tradeoff between latency and cost and show the pareto optimal curves. Finally, we compare the optimal solution with the proposed heuristic and showed that the average latency per request increased by more than 60%. This can be improved using more efficient placement algorithms.
引用
收藏
页码:479 / 484
页数:6
相关论文
共 50 条
  • [1] Modeling of Deep Neural Network (DNN) Placement and Inference in Edge Computing
    Bensalem, Mounir
    Dizdarevic, Jasenka
    Jukan, Admela
    2020 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS WORKSHOPS (ICC WORKSHOPS), 2020,
  • [2] Joint Optimization of Device Placement and Model Partitioning for Cooperative DNN Inference in Heterogeneous Edge Computing
    Dai, Penglin
    Han, Biao
    Li, Ke
    Xu, Xincao
    Xing, Huanlai
    Liu, Kai
    IEEE TRANSACTIONS ON MOBILE COMPUTING, 2025, 24 (01) : 210 - 226
  • [3] Elastic DNN Inference With Unpredictable Exit in Edge Computing
    Huang, Jiaming
    Gao, Yi
    Dong, Wei
    IEEE TRANSACTIONS ON MOBILE COMPUTING, 2024, 23 (12) : 14005 - 14016
  • [4] Elastic DNN Inference with Unpredictable Exit in Edge Computing
    Huang, Jiaming
    Gao, Yi
    Dong, Wei
    2023 IEEE 43RD INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS, ICDCS, 2023, : 293 - 304
  • [5] Accelerating DNN Inference With Reliability Guarantee in Vehicular Edge Computing
    Liu, Kai
    Liu, Chunhui
    Yan, Guozhi
    Lee, Victor C. S.
    Cao, Jiannong
    IEEE-ACM TRANSACTIONS ON NETWORKING, 2023, 31 (06) : 3238 - 3253
  • [6] DNN Inference Acceleration with Partitioning and Early Exiting in Edge Computing
    Li, Chao
    Xu, Hongli
    Xu, Yang
    Wang, Zhiyuan
    Huang, Liusheng
    WIRELESS ALGORITHMS, SYSTEMS, AND APPLICATIONS, WASA 2021, PT I, 2021, 12937 : 465 - 478
  • [7] Efficient Online DNN Inference with Continuous Learning in Edge Computing
    Zeng, Yifan
    Zhou, Ruiting
    Jia, Lei
    Han, Ziyi
    Yu, Jieling
    Ma, Yue
    2024 IEEE/ACM 32ND INTERNATIONAL SYMPOSIUM ON QUALITY OF SERVICE, IWQOS, 2024,
  • [8] Throughput Maximization of Delay-Aware DNN Inference in Edge Computing by Exploring DNN Model Partitioning and Inference Parallelism
    Li, Jing
    Liang, Weifa
    Li, Yuchen
    Xu, Zichuan
    Jia, Xiaohua
    Guo, Song
    IEEE TRANSACTIONS ON MOBILE COMPUTING, 2023, 22 (05) : 3017 - 3030
  • [9] Online Optimization of DNN Inference Network Utility in Collaborative Edge Computing
    Li, Rui
    Ouyang, Tao
    Zeng, Liekang
    Liao, Guocheng
    Zhou, Zhi
    Chen, Xu
    IEEE-ACM TRANSACTIONS ON NETWORKING, 2024, 32 (05) : 4414 - 4426
  • [10] ADDA: Adaptive Distributed DNN Inference Acceleration in Edge Computing Environment
    Wang, Huitian
    Cai, Guangxing
    Huang, Zhaowu
    Dong, Fang
    2019 IEEE 25TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2019, : 438 - 445