SUARA: A scalable universal allreduce communication algorithm for acceleration of parallel deep learning applications

被引:1
|
作者
Nuriyev, Emin [1 ]
Manumachu, Ravi Reddy [1 ]
Aseeri, Samar [2 ]
Verma, Mahendra K. [3 ]
Lastovetsky, Alexey L. [1 ]
机构
[1] Univ Coll Dublin, Sch Comp Sci, Dublin, Ireland
[2] King Abdullah Univ Sci & Technol KAUST, Extreme Comp Res Ctr ECRC, Thuwal, Saudi Arabia
[3] Indian Inst Technol Kanpur, Dept Phys, Kanpur, India
关键词
Allreduce communication algorithm; MPI; Parallel deep learning; ResNet-50; Imagenet; HIGH-PERFORMANCE; MPI;
D O I
10.1016/j.jpdc.2023.104767
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Parallel and distributed deep learning (PDNN) has become an effective strategy to reduce the long training times of large-scale deep neural networks. Mainstream PDNN software packages based on the message-passing interface (MPI) and employing synchronous stochastic gradient descent rely crucially on the performance of MPI allreduce collective communication routine. In this work, we propose a novel scalable universal allreduce meta-algorithm called SUARA. In general, SUARA consists of L serial steps, where L >= 2, executed by all MPI processes involved in the allreduce operation. At each step, SUARA partitions this set of processes into subsets, which execute optimally selected library allreduce algorithms to solve sub-allreduce problems on these subsets in parallel, to accomplish the whole allreduce operation after completing all the L steps. We then design, theoretically study and implement a two-step SUARA (L = 2) called SUARA2 on top of the Open MPI library. We prove that the theoretical asymptotic speedup of SUARA2 executed by P processes over the base Open MPI routine is O( P). Our experiments on Shaheen-II supercomputer employing 1024 nodes demonstrate over 2x speedup of SUARA2 over native Open MPI allreduce routine, which translates into the performance improvement of training of ResNet-50 DNN on ImageNet by 9%. (c) 2023 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons .org /licenses /by /4 .0/).
引用
收藏
页数:15
相关论文
共 50 条
  • [21] Communication-Efficient Local Stochastic Gradient Descent for Scalable Deep Learning
    Lee, Sunwoo
    Kang, Qiao
    Agrawal, Ankit
    Choudhary, Alok
    Liao, Wei-keng
    2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 718 - 727
  • [22] The SELENE Deep Learning Acceleration Framework for Safety-related Applications
    Medina, Laura
    Carrion, Salva
    Andreu, Pablo
    Picornell, Tomas
    Flich, Jose
    Hernandez, Caries
    Sandoval, Michael
    Sainz, Markel
    Lefebvre, Charles-Alexis
    Ronnback, Martin
    Matschnig, Martin
    Wesss, Matthias
    Taucher, Herbert
    PROCEEDINGS OF THE 2022 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE 2022), 2022, : 636 - 639
  • [23] A new parallel deep learning algorithm for breast cancer classification
    Kazemi, Ahmad
    Shiri, Mohammad Ebrahim
    Sheikhahmadi, Amir
    Khodamoradi, Mohamad
    INTERNATIONAL JOURNAL OF NONLINEAR ANALYSIS AND APPLICATIONS, 2021, 12 : 1269 - +
  • [24] A Scalable Parallel Q-Learning Algorithm for Resource Constrained Decentralized Computing Environments
    Camelo, Miguel
    Famaey, Jeroen
    Latre, Steven
    PROCEEDINGS OF 2016 2ND WORKSHOP ON MACHINE LEARNING IN HPC ENVIRONMENTS (MLHPC), 2016, : 27 - 35
  • [25] Automatic Pipeline Parallelism: A Parallel Inference Framework for Deep Learning Applications in 6G Mobile Communication Systems
    Shi, Hongjian
    Zheng, Weichu
    Liu, Zifei
    Ma, Ruhui
    Guan, Haibing
    IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 2023, 41 (07) : 2041 - 2056
  • [26] Two Applications of Deep Learning in the Physical Layer of Communication Systems
    Bjornson, Emil
    Giselsson, Pontus
    IEEE SIGNAL PROCESSING MAGAZINE, 2020, 37 (05) : 134 - 140
  • [27] A Survey on Deep Learning-Based Vehicular Communication Applications
    Lin, Chia-Hung
    Lin, Yu-Chien
    Wu, Yen-Jung
    Chung, Wei-Ho
    Lee, Ta-Sung
    JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2021, 93 (04): : 369 - 388
  • [28] Applications of Deep Learning to the Design of Enhanced Wireless Communication Systems
    Goutay, Mathieu
    arXiv, 2022,
  • [29] A Survey on Deep Learning-Based Vehicular Communication Applications
    Chia-Hung Lin
    Yu-Chien Lin
    Yen-Jung Wu
    Wei-Ho Chung
    Ta-Sung Lee
    Journal of Signal Processing Systems, 2021, 93 : 369 - 388
  • [30] A unified hybrid memory system for scalable deep learning and big data applications
    Rang, Wei
    Liang, Huanghuang
    Wang, Ye
    Zhou, Xiaobo
    Cheng, Dazhao
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2024, 186