Scalable Heterogeneous Scheduling Based Model Parallelism for Real-Time Inference of Large-Scale Deep Neural Networks

被引：0

作者：

Zou, Xiaofeng ^{[1
]}

Chen, Cen ^{[1
,2
]}

Lin, Peiying ^{[3
,4
]}

Zhang, Luochuan ^{[1
]}

Xu, Yanwu ^{[1
,2
]}

Zhang, Wenjie ^{[5
]}

机构：

[1] South China Univ Technol, Sch Future Technol, Guangzhou 510641, Peoples R China

[2] Pazhou Lab, Guangzhou 510330, Peoples R China

[3] Xiangtan Univ XTU, Sch Comp Sci, Xiangtan 411105, Peoples R China

[4] Xiangtan Univ XTU, Sch Cyberspace Sci, Xiangtan 411105, Peoples R China

[5] Hong Kong Polytech Univ, Dept Elect Engn, Kowloon, Hong Kong, Peoples R China

来源：

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE | 2024年 / 8卷 / 04期

关键词：

Computational modeling; Artificial neural networks; Task analysis; Processor scheduling; Parallel processing; Program processors; Job shop scheduling; Deep neural networks; model parallelism; parallel computing; GRAPHS;

D O I：

10.1109/TETCI.2024.3369628

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Scaling up the capacity of deep neural networks (DNN) is one of the effective approaches to improve the model quality for several different DNN-based applications, making the DNN models continuously grow. To promote the execution efficiency of large and complex models, the devices are becoming increasingly heterogeneous with CPUs and domain-specific hardware accelerators. In many cases, the capacity of large-scale models is beyond the memory limit of a single accelerator. Recent work has shown that model parallelism, which aims to partition a DNN's computational graph on multiple devices, can not only address this problem while also provide significant performance improvements. In this work, we focus on optimizing model parallelism for timely inference of large-scale DNNs on heterogeneous processors. We transform the computation graphs of DNNs into directed acyclic graphs (DAGs) and propose to utilize heterogeneous scheduling methods to determine the model partition plan. Nevertheless, we have found that current efficient DAG scheduling methods have a lot of room for improvement to process large-scale DAGs and have high computation complexity. To this end, we propose a scalable DAG partition assisted scheduling method for heterogeneous processors to address these problems. Our approach takes the execution time of DNN models, high scalability, and memory constraints into consideration. We demonstrate the effectiveness of our approaches using both small- and large-scale DNN models. To the best of our knowledge, it is the first work that explores DAG scheduling and partitioning methods for model parallelism, and provides new avenues for accelerating large-scale DNN inference.

引用

页码：2962 / 2973

页数：12

共 50 条

[1] PipeEdge: Pipeline Parallelism for Large-Scale Model Inference on Heterogeneous Edge Devices
Hu, Yang
Imes, Connor
Zhao, Xuanang
Kundu, Souvik
Beerel, Peter A.
Crago, Stephen P.
Walters, John Paul
2022 25TH EUROMICRO CONFERENCE ON DIGITAL SYSTEM DESIGN (DSD), 2022, : 298 - 307
[2] A MODEL FOR REAL-TIME SIMULATION OF LARGE-SCALE NETWORKS BASED ON NETWORK PROCESSOR
Xu Xiaobo
Zheng Kangfeng
Yang Yixian
Xu Guoai
PROCEEDINGS OF 2009 2ND IEEE INTERNATIONAL CONFERENCE ON BROADBAND NETWORK & MULTIMEDIA TECHNOLOGY, 2009, : 237 - 241
[3] Constrained large-scale real-time EV scheduling based on recurrent deep reinforcement learning
Li, Hang
Li, Guojie
Lie, Tek Tjing
Li, Xingzhi
Wang, Keyou
Han, Bei
Xu, Jin
INTERNATIONAL JOURNAL OF ELECTRICAL POWER & ENERGY SYSTEMS, 2023, 144
[4] Heterogeneous model parallelism for deep neural networks
Moreno-Alvarez, Sergio
Haut, Juan M.
Paoletti, Mercedes E.
Rico-Gallego, Juan A.
NEUROCOMPUTING, 2021, 441 : 1 - 12
[5] Power analysis of large-scale, real-time neural networks on SpiNNaker
Stromatias, Evangelos
Galluppi, Francesco
Patterson, Cameron
Furber, Steve
2013 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2013,
[6] Heterogeneous Scheduling of Deep Neural Networks for Low-power Real-time Designs
Shea, Colin
Mohsenin, Tinoosh
ACM JOURNAL ON EMERGING TECHNOLOGIES IN COMPUTING SYSTEMS, 2019, 15 (04)
[7] Real-Time Neuromorphic System for Large-Scale Conductance-Based Spiking Neural Networks
Yang, Shuangming
Wang, Jiang
Deng, Bin
Liu, Chen
Li, Huiyan
Fietkiewicz, Chris
Loparo, Kenneth A.
IEEE TRANSACTIONS ON CYBERNETICS, 2019, 49 (07) : 2490 - 2503
[8] A primer for real-time simulation of large-scale networks
Liu, Jason
41ST ANNUAL SIMULATION SYMPOSIUM, PROCEEDINGS, 2008, : 85 - 94
[9] Model-based engineering of large-scale real-time systems
Bapty, TA
Sztipanovits, J
INTERNATIONAL CONFERENCE AND WORKSHOP ON ENGINEERING OF COMPUTER-BASED SYSTEMS, PROCEEDINGS, 1997, : 467 - 474
[10] RACE: A Real-Time Scheduling Policy and Communication Architecture for Large-Scale Wireless Sensor Networks
Mizanian, Kambiz
Hajisheykhi, Reza
Baharloo, Mohammad
Jahangir, Amir Hossein
2009 7TH ANNUAL COMMUNICATION NETWORKS AND SERVICES RESEARCH CONFERENCE, 2009, : 458 - 460

← 1 2 3 4 5 →