DNN Placement and Inference in Edge Computing

被引:0
|
作者
Bensalem, Mounir [1 ]
Dizdarevic, Jasenka [1 ]
Jukan, Admela [1 ]
机构
[1] Tech Univ Carolo Wilhelmina Braunschweig, Braunschweig, Germany
关键词
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The deployment of deep neural network (DNN) models in software applications is increasing rapidly with the exponential growth of artificial intelligence. Currently, such models are deployed manually by developers in the cloud considering several user requirements, while the decision of model selection and user assignment is difficult to take. With the rise of edge computing paradigm, companies tend to deploy applications as close as possible to the user. Considering this system, the problem of DNN model selection and the inference serving becomes harder due to the introduction of communication latency between nodes. We present an automatic method for DNN placement and inference in edge computing; a mathematical formulation to the DNN Model Variant Selection and Placement (MVSP) problem is presented, it considers the inference latency of different model-variants, communication latency between nodes, and utilization cost of edge computing nodes. Furthermore, we propose a general heuristic algorithm to solve the MVSP problem. We provide an analysis of the effects of hardware sharing on inference latency, on an example of GPU edge computing nodes shared between different DNN model-variants. We evaluate our model numerically, and show the potentials of GPU sharing, with decreased average latency by 33% of millisecond-scale per request for low load, and by 21% for high load. We study the tradeoff between latency and cost and show the pareto optimal curves. Finally, we compare the optimal solution with the proposed heuristic and showed that the average latency per request increased by more than 60%. This can be improved using more efficient placement algorithms.
引用
收藏
页码:479 / 484
页数:6
相关论文
共 50 条
  • [21] A Survey on Collaborative DNN Inference for Edge Intelligence
    Ren, Wei-Qing
    Qu, Yu-Ben
    Dong, Chao
    Jing, Yu-Qian
    Sun, Hao
    Wu, Qi-Hui
    Guo, Song
    MACHINE INTELLIGENCE RESEARCH, 2023, 20 (03) : 370 - 395
  • [22] A Survey on Collaborative DNN Inference for Edge Intelligence
    Wei-Qing Ren
    Yu-Ben Qu
    Chao Dong
    Yu-Qian Jing
    Hao Sun
    Qi-Hui Wu
    Song Guo
    Machine Intelligence Research, 2023, 20 : 370 - 395
  • [23] DNN Partitioning for Inference Throughput Acceleration at the Edge
    Feltin, Thomas
    Marcho, Leo
    Cordero-Fuertes, Juan-Antonio
    Brockners, Frank
    Clausen, Thomas H.
    IEEE ACCESS, 2023, 11 : 52236 - 52249
  • [24] Energy-optimal DNN model placement in UAV-enabled edge computing networks
    Jianhang Tang
    Guoquan Wu
    Mohammad Mussadiq Jalalzai
    Lin Wang
    Bing Zhang
    Yi Zhou
    Digital Communications and Networks, 2024, 10 (04) : 827 - 836
  • [25] Energy-optimal DNN model placement in UAV-enabled edge computing networks
    Tang, Jianhang
    Wu, Guoquan
    Jalalzai, Mohammad Mussadiq
    Wang, Lin
    Zhang, Bing
    Zhou, Yi
    DIGITAL COMMUNICATIONS AND NETWORKS, 2024, 10 (04) : 827 - 836
  • [26] Edge intelligence in motion: Mobility-aware dynamic DNN inference service migration with downtime in mobile edge computing
    Wang, Pu
    Ouyang, Tao
    Liao, Guocheng
    Gong, Jie
    Yu, Shuai
    Chen, Xu
    Journal of Systems Architecture, 2022, 130
  • [27] Edge intelligence in motion: Mobility-aware dynamic DNN inference service migration with downtime in mobile edge computing
    Wang, Pu
    Ouyang, Tao
    Liao, Guocheng
    Gong, Jie
    Yu, Shuai
    Chen, Xu
    JOURNAL OF SYSTEMS ARCHITECTURE, 2022, 130
  • [28] A DNN inference acceleration algorithm combining model partition and task allocation in heterogeneous edge computing system
    Lei Shi
    Zhigang Xu
    Yabo Sun
    Yi Shi
    Yuqi Fan
    Xu Ding
    Peer-to-Peer Networking and Applications, 2021, 14 : 4031 - 4045
  • [29] Adaptive DNN Partition in Edge Computing Environments
    Miao, Weiwei
    Zeng, Zeng
    Wei, Lei
    Li, Shihao
    Jiang, Chengling
    Zhang, Zhen
    2020 IEEE 26TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2020, : 685 - 690
  • [30] DNN inference offloading for object detection in 5G multi-access edge computing
    Kim, Geun-Yong
    Kim, Ryangsoo
    Kim, Sungchang
    Nam, Ki-Dong
    Rha, Sung-Uk
    Yoon, Jung-Hyun
    12TH INTERNATIONAL CONFERENCE ON ICT CONVERGENCE (ICTC 2021): BEYOND THE PANDEMIC ERA WITH ICT CONVERGENCE INNOVATION, 2021, : 389 - 392