POS: An Operator Scheduling Framework for Multi-model Inference on Edge Intelligent Computing

被引：5

作者：

Zhang, Ziyang ^{[1
]}

Li, Huan ^{[2
]}

Zhao, Yang ^{[2
]}

Lin, Changyao ^{[1
]}

Liu, Jie ^{[2
]}

机构：

[1] Harbin Inst Technol, Harbin, Heilongjiang, Peoples R China

[2] Harbin Inst Technol, Shenzhen, Guangdong, Peoples R China

来源：

PROCEEDINGS OF THE 2023 THE 22ND INTERNATIONAL CONFERENCE ON INFORMATION PROCESSING IN SENSOR NETWORKS, IPSN 2023 | 2023年

基金：

国家重点研发计划;

关键词：

edge computing; multi-model inference; operator scheduling; deep reinforcement earning;

D O I：

10.1145/3583120.3586953

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Edge intelligent applications, such as autonomous driving usually deploy multiple inference models on resource-constrained edge devices to execute a diverse range of concurrent tasks, given large amounts of input data. One challenge is that these tasks need to produce reliable inference results simultaneously with millisecond-level latency to achieve real-time performance and high quality of service (QoS). However, most of the existing deep learning frameworks only focus on optimizing a single inference model on an edge device. To accelerate multi-model inference on a resource-constrained edge device, in this paper we propose POS, a novel operator-level scheduling framework that combines four operator scheduling strategies. The key to POS is a maximum entropy reinforcement learning-based operator scheduling algorithm MEOS, which generates an optimal schedule automatically. Extensive experiments show that POS outperforms five state-of-the-art inference frameworks: TensorFlow, PyTorch, TensorRT, TVM, and IOS, by up to 1.2x similar to 3.9x inference speedup consistently, with 40% improvement on GPU utilization. Meanwhile, MEOS reduces the scheduling overhead by 37% on average, compared to five baseline methods including sequential execution, dynamic programming, greedy scheduling, actor-critic, and coordinate descent search algorithms.

引用

页码：40 / 52

页数：13

共 50 条

[1] Multi-model partitioning the multi-model evolutionary framework for intelligent control
Lainiotis, DG
PROCEEDINGS OF THE 2000 IEEE INTERNATIONAL SYMPOSIUM ON INTELLIGENT CONTROL, 2000, : P15 - P20
[2] Multi-Model Inference Composition of Hyperdimensional Computing Ensembles
Ponzina, Flavio
Chandrasekaran, Rishikanth
Wang, Anya
Minowada, Seiji
Sharma, Siddharth
Rosing, Tajana
2024 IEEE 42ND INTERNATIONAL CONFERENCE ON COMPUTER DESIGN, ICCD, 2024, : 691 - 698
[3] Multi-Model Running Latency Optimization in an Edge Computing Paradigm
Li, Peisong
Wang, Xinheng
Huang, Kaizhu
Huang, Yi
Li, Shancang
Iqbal, Muddesar
SENSORS, 2022, 22 (16)
[4] Multi-Model Inference in Biogeography
Millington, James D. A.
Perry, George L. W.
GEOGRAPHY COMPASS, 2011, 5 (07): : 448 - 463
[5] CUE: An Intelligent Edge Computing Framework
Yang, Boran
Wu, Dapeng
Wang, Ruyan
IEEE NETWORK, 2019, 33 (03): : 18 - 25
[6] Schema Inference for Multi-Model Data
Koupil, Pavel
Hricko, Sebastian
Holubova, Irena
PROCEEDINGS OF THE 25TH INTERNATIONAL ACM/IEEE CONFERENCE ON MODEL DRIVEN ENGINEERING LANGUAGES AND SYSTEMS, MODELS 2022, 2022, : 13 - 23
[7] Joint Participant Selection and Learning Scheduling for Multi-Model Federated Edge Learning
Wei, Xinliang
Liu, Jiyao
Wang, Yu
2022 IEEE 19TH INTERNATIONAL CONFERENCE ON MOBILE AD HOC AND SMART SYSTEMS (MASS 2022), 2022, : 537 - 545
[8] A Framework for Multi-Model Ensembling
Berliner, L. Mark
Brynjarsdottir, Jenny
SIAM-ASA JOURNAL ON UNCERTAINTY QUANTIFICATION, 2016, 4 (01): : 902 - 923
[9] WattWiser: Power & Resource-Efficient Scheduling for Multi-Model Multi-GPU Inference Servers
Jahanshahi, Ali
Rezvani, Mohammadreza
Wong, Daniel
2023 THE 14TH INTERNATIONAL GREEN AND SUSTAINABLE COMPUTING CONFERENCE, IGSC-2023, 2023, : 39 - 44
[10] Resource allocation and scheduling in the intelligent edge computing context
Liu, Jun
Yang, Tianfu
Bai, Jingpan
Sun, Bo
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2021, 121 : 48 - 53

← 1 2 3 4 5 →