POS: An Operator Scheduling Framework for Multi-model Inference on Edge Intelligent Computing

被引:5
|
作者
Zhang, Ziyang [1 ]
Li, Huan [2 ]
Zhao, Yang [2 ]
Lin, Changyao [1 ]
Liu, Jie [2 ]
机构
[1] Harbin Inst Technol, Harbin, Heilongjiang, Peoples R China
[2] Harbin Inst Technol, Shenzhen, Guangdong, Peoples R China
基金
国家重点研发计划;
关键词
edge computing; multi-model inference; operator scheduling; deep reinforcement earning;
D O I
10.1145/3583120.3586953
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Edge intelligent applications, such as autonomous driving usually deploy multiple inference models on resource-constrained edge devices to execute a diverse range of concurrent tasks, given large amounts of input data. One challenge is that these tasks need to produce reliable inference results simultaneously with millisecond-level latency to achieve real-time performance and high quality of service (QoS). However, most of the existing deep learning frameworks only focus on optimizing a single inference model on an edge device. To accelerate multi-model inference on a resource-constrained edge device, in this paper we propose POS, a novel operator-level scheduling framework that combines four operator scheduling strategies. The key to POS is a maximum entropy reinforcement learning-based operator scheduling algorithm MEOS, which generates an optimal schedule automatically. Extensive experiments show that POS outperforms five state-of-the-art inference frameworks: TensorFlow, PyTorch, TensorRT, TVM, and IOS, by up to 1.2x similar to 3.9x inference speedup consistently, with 40% improvement on GPU utilization. Meanwhile, MEOS reduces the scheduling overhead by 37% on average, compared to five baseline methods including sequential execution, dynamic programming, greedy scheduling, actor-critic, and coordinate descent search algorithms.
引用
收藏
页码:40 / 52
页数:13
相关论文
共 50 条
  • [41] Data model descriptions and translation signatures in a multi-model framework
    Paolo Atzeni
    Giorgio Gianforme
    Paolo Cappellari
    Annals of Mathematics and Artificial Intelligence, 2011, 63 : 287 - 315
  • [42] Data model descriptions and translation signatures in a multi-model framework
    Atzeni, Paolo
    Gianforme, Giorgio
    Cappellari, Paolo
    ANNALS OF MATHEMATICS AND ARTIFICIAL INTELLIGENCE, 2011, 63 (3-4) : 287 - 315
  • [43] An intelligent scheduling framework for DNN task acceleration in heterogeneous edge networks
    Feng, Yiming
    Hu, Shihong
    Chen, Lingqiang
    Li, Guanghui
    COMPUTER COMMUNICATIONS, 2023, 201 : 91 - 101
  • [44] Optimum Selection of DNN Model and Framework for Edge Inference
    Velasco-Montero, Delia
    Fernandez-Berni, Jorge
    Carmona-Galan, Ricardo
    Rodriguez-Vazquez, Angel
    IEEE ACCESS, 2018, 6 : 51680 - 51692
  • [45] UAV Swarm Centroid Tracking for Edge Computing Applications Using GRU-Assisted Multi-Model Filtering
    Chen, Yudi
    Liu, Xiangyu
    Li, Changqing
    Zhu, Jiao
    Wu, Min
    Su, Xiang
    ELECTRONICS, 2024, 13 (06)
  • [46] MOBILE EDGE COMPUTING FOR THE INTERNET OF VEHICLES Offloading Framework and Job Scheduling
    Feng, Jingyun
    Liu, Zhi
    Wu, Celimuge
    Ji, Yusheng
    IEEE VEHICULAR TECHNOLOGY MAGAZINE, 2019, 14 (01): : 28 - 36
  • [47] Strategic Review and Framework of Task Scheduling Algorithms in Mobile Edge Computing
    Muthukumari, S. M.
    Raj, E. George Dharma Prakash
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2022, 22 (03): : 405 - 412
  • [48] An Efficient Scheduling Strategy for Collaborative Cloud and Edge Computing in System of Intelligent Buildings
    Feng, Xiaodong
    Yi, Lingzhi
    Liu, Ning
    Gao, Xieyi
    Liu, Weiwei
    Wang, Bin
    JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, 2023, 27 (05) : 948 - 958
  • [49] An intelligent detection method for assembly based on multi-model cascade
    Xu, Hanzhong
    Wu, Dianliang
    Zou, Kai
    Yu, Qihang
    Yu, Haiwen
    2024 16TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND COMPUTING, ICMLC 2024, 2024, : 280 - 286
  • [50] Task Offloading and Scheduling Strategy for Intelligent Prosthesis in Mobile Edge Computing Environment
    Qi, Ping
    WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2022, 2022