POS: An Operator Scheduling Framework for Multi-model Inference on Edge Intelligent Computing

被引:5
|
作者
Zhang, Ziyang [1 ]
Li, Huan [2 ]
Zhao, Yang [2 ]
Lin, Changyao [1 ]
Liu, Jie [2 ]
机构
[1] Harbin Inst Technol, Harbin, Heilongjiang, Peoples R China
[2] Harbin Inst Technol, Shenzhen, Guangdong, Peoples R China
基金
国家重点研发计划;
关键词
edge computing; multi-model inference; operator scheduling; deep reinforcement earning;
D O I
10.1145/3583120.3586953
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Edge intelligent applications, such as autonomous driving usually deploy multiple inference models on resource-constrained edge devices to execute a diverse range of concurrent tasks, given large amounts of input data. One challenge is that these tasks need to produce reliable inference results simultaneously with millisecond-level latency to achieve real-time performance and high quality of service (QoS). However, most of the existing deep learning frameworks only focus on optimizing a single inference model on an edge device. To accelerate multi-model inference on a resource-constrained edge device, in this paper we propose POS, a novel operator-level scheduling framework that combines four operator scheduling strategies. The key to POS is a maximum entropy reinforcement learning-based operator scheduling algorithm MEOS, which generates an optimal schedule automatically. Extensive experiments show that POS outperforms five state-of-the-art inference frameworks: TensorFlow, PyTorch, TensorRT, TVM, and IOS, by up to 1.2x similar to 3.9x inference speedup consistently, with 40% improvement on GPU utilization. Meanwhile, MEOS reduces the scheduling overhead by 37% on average, compared to five baseline methods including sequential execution, dynamic programming, greedy scheduling, actor-critic, and coordinate descent search algorithms.
引用
收藏
页码:40 / 52
页数:13
相关论文
共 50 条
  • [1] Multi-model partitioning the multi-model evolutionary framework for intelligent control
    Lainiotis, DG
    PROCEEDINGS OF THE 2000 IEEE INTERNATIONAL SYMPOSIUM ON INTELLIGENT CONTROL, 2000, : P15 - P20
  • [2] Multi-Model Inference Composition of Hyperdimensional Computing Ensembles
    Ponzina, Flavio
    Chandrasekaran, Rishikanth
    Wang, Anya
    Minowada, Seiji
    Sharma, Siddharth
    Rosing, Tajana
    2024 IEEE 42ND INTERNATIONAL CONFERENCE ON COMPUTER DESIGN, ICCD, 2024, : 691 - 698
  • [3] Multi-Model Running Latency Optimization in an Edge Computing Paradigm
    Li, Peisong
    Wang, Xinheng
    Huang, Kaizhu
    Huang, Yi
    Li, Shancang
    Iqbal, Muddesar
    SENSORS, 2022, 22 (16)
  • [4] Multi-Model Inference in Biogeography
    Millington, James D. A.
    Perry, George L. W.
    GEOGRAPHY COMPASS, 2011, 5 (07): : 448 - 463
  • [5] CUE: An Intelligent Edge Computing Framework
    Yang, Boran
    Wu, Dapeng
    Wang, Ruyan
    IEEE NETWORK, 2019, 33 (03): : 18 - 25
  • [6] Schema Inference for Multi-Model Data
    Koupil, Pavel
    Hricko, Sebastian
    Holubova, Irena
    PROCEEDINGS OF THE 25TH INTERNATIONAL ACM/IEEE CONFERENCE ON MODEL DRIVEN ENGINEERING LANGUAGES AND SYSTEMS, MODELS 2022, 2022, : 13 - 23
  • [7] Joint Participant Selection and Learning Scheduling for Multi-Model Federated Edge Learning
    Wei, Xinliang
    Liu, Jiyao
    Wang, Yu
    2022 IEEE 19TH INTERNATIONAL CONFERENCE ON MOBILE AD HOC AND SMART SYSTEMS (MASS 2022), 2022, : 537 - 545
  • [8] A Framework for Multi-Model Ensembling
    Berliner, L. Mark
    Brynjarsdottir, Jenny
    SIAM-ASA JOURNAL ON UNCERTAINTY QUANTIFICATION, 2016, 4 (01): : 902 - 923
  • [9] WattWiser: Power & Resource-Efficient Scheduling for Multi-Model Multi-GPU Inference Servers
    Jahanshahi, Ali
    Rezvani, Mohammadreza
    Wong, Daniel
    2023 THE 14TH INTERNATIONAL GREEN AND SUSTAINABLE COMPUTING CONFERENCE, IGSC-2023, 2023, : 39 - 44
  • [10] Resource allocation and scheduling in the intelligent edge computing context
    Liu, Jun
    Yang, Tianfu
    Bai, Jingpan
    Sun, Bo
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2021, 121 : 48 - 53