Scalable Heterogeneous Scheduling Based Model Parallelism for Real-Time Inference of Large-Scale Deep Neural Networks

被引:0
|
作者
Zou, Xiaofeng [1 ]
Chen, Cen [1 ,2 ]
Lin, Peiying [3 ,4 ]
Zhang, Luochuan [1 ]
Xu, Yanwu [1 ,2 ]
Zhang, Wenjie [5 ]
机构
[1] South China Univ Technol, Sch Future Technol, Guangzhou 510641, Peoples R China
[2] Pazhou Lab, Guangzhou 510330, Peoples R China
[3] Xiangtan Univ XTU, Sch Comp Sci, Xiangtan 411105, Peoples R China
[4] Xiangtan Univ XTU, Sch Cyberspace Sci, Xiangtan 411105, Peoples R China
[5] Hong Kong Polytech Univ, Dept Elect Engn, Kowloon, Hong Kong, Peoples R China
关键词
Computational modeling; Artificial neural networks; Task analysis; Processor scheduling; Parallel processing; Program processors; Job shop scheduling; Deep neural networks; model parallelism; parallel computing; GRAPHS;
D O I
10.1109/TETCI.2024.3369628
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Scaling up the capacity of deep neural networks (DNN) is one of the effective approaches to improve the model quality for several different DNN-based applications, making the DNN models continuously grow. To promote the execution efficiency of large and complex models, the devices are becoming increasingly heterogeneous with CPUs and domain-specific hardware accelerators. In many cases, the capacity of large-scale models is beyond the memory limit of a single accelerator. Recent work has shown that model parallelism, which aims to partition a DNN's computational graph on multiple devices, can not only address this problem while also provide significant performance improvements. In this work, we focus on optimizing model parallelism for timely inference of large-scale DNNs on heterogeneous processors. We transform the computation graphs of DNNs into directed acyclic graphs (DAGs) and propose to utilize heterogeneous scheduling methods to determine the model partition plan. Nevertheless, we have found that current efficient DAG scheduling methods have a lot of room for improvement to process large-scale DAGs and have high computation complexity. To this end, we propose a scalable DAG partition assisted scheduling method for heterogeneous processors to address these problems. Our approach takes the execution time of DNN models, high scalability, and memory constraints into consideration. We demonstrate the effectiveness of our approaches using both small- and large-scale DNN models. To the best of our knowledge, it is the first work that explores DAG scheduling and partitioning methods for model parallelism, and provides new avenues for accelerating large-scale DNN inference.
引用
收藏
页码:2962 / 2973
页数:12
相关论文
共 50 条
  • [1] PipeEdge: Pipeline Parallelism for Large-Scale Model Inference on Heterogeneous Edge Devices
    Hu, Yang
    Imes, Connor
    Zhao, Xuanang
    Kundu, Souvik
    Beerel, Peter A.
    Crago, Stephen P.
    Walters, John Paul
    2022 25TH EUROMICRO CONFERENCE ON DIGITAL SYSTEM DESIGN (DSD), 2022, : 298 - 307
  • [2] A MODEL FOR REAL-TIME SIMULATION OF LARGE-SCALE NETWORKS BASED ON NETWORK PROCESSOR
    Xu Xiaobo
    Zheng Kangfeng
    Yang Yixian
    Xu Guoai
    PROCEEDINGS OF 2009 2ND IEEE INTERNATIONAL CONFERENCE ON BROADBAND NETWORK & MULTIMEDIA TECHNOLOGY, 2009, : 237 - 241
  • [3] Constrained large-scale real-time EV scheduling based on recurrent deep reinforcement learning
    Li, Hang
    Li, Guojie
    Lie, Tek Tjing
    Li, Xingzhi
    Wang, Keyou
    Han, Bei
    Xu, Jin
    INTERNATIONAL JOURNAL OF ELECTRICAL POWER & ENERGY SYSTEMS, 2023, 144
  • [4] Heterogeneous model parallelism for deep neural networks
    Moreno-Alvarez, Sergio
    Haut, Juan M.
    Paoletti, Mercedes E.
    Rico-Gallego, Juan A.
    NEUROCOMPUTING, 2021, 441 : 1 - 12
  • [5] Power analysis of large-scale, real-time neural networks on SpiNNaker
    Stromatias, Evangelos
    Galluppi, Francesco
    Patterson, Cameron
    Furber, Steve
    2013 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2013,
  • [6] Heterogeneous Scheduling of Deep Neural Networks for Low-power Real-time Designs
    Shea, Colin
    Mohsenin, Tinoosh
    ACM JOURNAL ON EMERGING TECHNOLOGIES IN COMPUTING SYSTEMS, 2019, 15 (04)
  • [7] Real-Time Neuromorphic System for Large-Scale Conductance-Based Spiking Neural Networks
    Yang, Shuangming
    Wang, Jiang
    Deng, Bin
    Liu, Chen
    Li, Huiyan
    Fietkiewicz, Chris
    Loparo, Kenneth A.
    IEEE TRANSACTIONS ON CYBERNETICS, 2019, 49 (07) : 2490 - 2503
  • [8] A primer for real-time simulation of large-scale networks
    Liu, Jason
    41ST ANNUAL SIMULATION SYMPOSIUM, PROCEEDINGS, 2008, : 85 - 94
  • [9] Model-based engineering of large-scale real-time systems
    Bapty, TA
    Sztipanovits, J
    INTERNATIONAL CONFERENCE AND WORKSHOP ON ENGINEERING OF COMPUTER-BASED SYSTEMS, PROCEEDINGS, 1997, : 467 - 474
  • [10] RACE: A Real-Time Scheduling Policy and Communication Architecture for Large-Scale Wireless Sensor Networks
    Mizanian, Kambiz
    Hajisheykhi, Reza
    Baharloo, Mohammad
    Jahangir, Amir Hossein
    2009 7TH ANNUAL COMMUNICATION NETWORKS AND SERVICES RESEARCH CONFERENCE, 2009, : 458 - 460