SUARA: A scalable universal allreduce communication algorithm for acceleration of parallel deep learning applications

被引:1
|
作者
Nuriyev, Emin [1 ]
Manumachu, Ravi Reddy [1 ]
Aseeri, Samar [2 ]
Verma, Mahendra K. [3 ]
Lastovetsky, Alexey L. [1 ]
机构
[1] Univ Coll Dublin, Sch Comp Sci, Dublin, Ireland
[2] King Abdullah Univ Sci & Technol KAUST, Extreme Comp Res Ctr ECRC, Thuwal, Saudi Arabia
[3] Indian Inst Technol Kanpur, Dept Phys, Kanpur, India
关键词
Allreduce communication algorithm; MPI; Parallel deep learning; ResNet-50; Imagenet; HIGH-PERFORMANCE; MPI;
D O I
10.1016/j.jpdc.2023.104767
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Parallel and distributed deep learning (PDNN) has become an effective strategy to reduce the long training times of large-scale deep neural networks. Mainstream PDNN software packages based on the message-passing interface (MPI) and employing synchronous stochastic gradient descent rely crucially on the performance of MPI allreduce collective communication routine. In this work, we propose a novel scalable universal allreduce meta-algorithm called SUARA. In general, SUARA consists of L serial steps, where L >= 2, executed by all MPI processes involved in the allreduce operation. At each step, SUARA partitions this set of processes into subsets, which execute optimally selected library allreduce algorithms to solve sub-allreduce problems on these subsets in parallel, to accomplish the whole allreduce operation after completing all the L steps. We then design, theoretically study and implement a two-step SUARA (L = 2) called SUARA2 on top of the Open MPI library. We prove that the theoretical asymptotic speedup of SUARA2 executed by P processes over the base Open MPI routine is O( P). Our experiments on Shaheen-II supercomputer employing 1024 nodes demonstrate over 2x speedup of SUARA2 over native Open MPI allreduce routine, which translates into the performance improvement of training of ResNet-50 DNN on ImageNet by 9%. (c) 2023 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons .org /licenses /by /4 .0/).
引用
收藏
页数:15
相关论文
共 50 条
  • [41] Research on Parallel Acceleration for Deep Learning Inference Based on Many-Core ARM Platform
    Zhu, Keqian
    Jiang, Jingfei
    ADVANCED COMPUTER ARCHITECTURE, 2018, 908 : 30 - 41
  • [42] Deep Learning-Enhanced Parallel Imaging and Simultaneous Multislice Acceleration Reconstruction in Knee MRI
    Kim, MinWoo
    Lee, Sang-Min
    Park, Chankue
    Lee, Dongeon
    Kim, Kang Soo
    Jeong, Hee Seok
    Kim, Shinyoung
    Choi, Min-Hyeok
    Nickel, Dominik
    INVESTIGATIVE RADIOLOGY, 2022, 57 (12) : 826 - 833
  • [43] An Enhanced Secure Deep Learning Algorithm for Fraud Detection in Wireless Communication
    Sanober, Sumaya
    Alam, Izhar
    Pande, Sagar
    Arslan, Farrukh
    Rane, Kantilal Pitambar
    Singh, Bhupesh Kumar
    Khamparia, Aditya
    Shabaz, Mohammad
    WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2021, 2021
  • [44] A Collaborative Communication Jamming Decision Algorithm Based on Deep Reinforcement Learning
    Song B.-L.
    Xu H.
    Qi Z.-S.
    Rao N.
    Peng X.
    Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2022, 50 (06): : 1301 - 1309
  • [45] Scalable 2D K-SVD Parallel Algorithm for Dictionary Learning on GPUs
    He, Lu
    Miskell, Timothy
    Liu, Rui
    Yu, Hengyong
    Xu, Huijuan
    Luo, Yan
    PROCEEDINGS OF THE ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS (CF'16), 2016, : 11 - 18
  • [46] Invited: Algorithm-Software-Hardware Co-Design for Deep Learning Acceleration
    Li, Zhengang
    Xie, Yanyue
    Dong, Peiyan
    Chen, Olivia
    Wang, Yanzhi
    2023 60TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC, 2023,
  • [47] Standardization and acceleration of OCT Angiography image quality assessment using a deep learning algorithm
    Lauermann, Jost Lennart
    Treder, Maximilian
    Alnawaiseh, Maged
    Clemens, Chrristoph
    Eter, Nicole
    Alten, Florian
    INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE, 2019, 60 (09)
  • [48] A parallel multi-module deep reinforcement learning algorithm for stock trading
    Ma, Cong
    Zhang, Jiangshe
    Liu, Junmin
    Ji, Lizhen
    Gao, Fei
    NEUROCOMPUTING, 2021, 449 : 290 - 302
  • [49] A survey on machine learning algorithm applications in visible light communication systems
    Sliti, Maha
    Mrabet, Manel
    Garai, Mouna
    Ammar, Lassaad Ben
    OPTICAL AND QUANTUM ELECTRONICS, 2024, 56 (08)
  • [50] Scalable Parallel Task Scheduling for Autonomous Driving Using Multi-Task Deep Reinforcement Learning
    Qi, Qi
    Zhang, Lingxin
    Wang, Jingyu
    Sun, Haifeng
    Zhuang, Zirui
    Liao, Jianxin
    Yu, F. Richard
    IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2020, 69 (11) : 13861 - 13874