SUARA: A scalable universal allreduce communication algorithm for acceleration of parallel deep learning applications

被引:1
|
作者
Nuriyev, Emin [1 ]
Manumachu, Ravi Reddy [1 ]
Aseeri, Samar [2 ]
Verma, Mahendra K. [3 ]
Lastovetsky, Alexey L. [1 ]
机构
[1] Univ Coll Dublin, Sch Comp Sci, Dublin, Ireland
[2] King Abdullah Univ Sci & Technol KAUST, Extreme Comp Res Ctr ECRC, Thuwal, Saudi Arabia
[3] Indian Inst Technol Kanpur, Dept Phys, Kanpur, India
关键词
Allreduce communication algorithm; MPI; Parallel deep learning; ResNet-50; Imagenet; HIGH-PERFORMANCE; MPI;
D O I
10.1016/j.jpdc.2023.104767
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Parallel and distributed deep learning (PDNN) has become an effective strategy to reduce the long training times of large-scale deep neural networks. Mainstream PDNN software packages based on the message-passing interface (MPI) and employing synchronous stochastic gradient descent rely crucially on the performance of MPI allreduce collective communication routine. In this work, we propose a novel scalable universal allreduce meta-algorithm called SUARA. In general, SUARA consists of L serial steps, where L >= 2, executed by all MPI processes involved in the allreduce operation. At each step, SUARA partitions this set of processes into subsets, which execute optimally selected library allreduce algorithms to solve sub-allreduce problems on these subsets in parallel, to accomplish the whole allreduce operation after completing all the L steps. We then design, theoretically study and implement a two-step SUARA (L = 2) called SUARA2 on top of the Open MPI library. We prove that the theoretical asymptotic speedup of SUARA2 executed by P processes over the base Open MPI routine is O( P). Our experiments on Shaheen-II supercomputer employing 1024 nodes demonstrate over 2x speedup of SUARA2 over native Open MPI allreduce routine, which translates into the performance improvement of training of ResNet-50 DNN on ImageNet by 9%. (c) 2023 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons .org /licenses /by /4 .0/).
引用
收藏
页数:15
相关论文
共 50 条
  • [1] A parallel deep learning algorithm with applications in process monitoring and fault prediction
    Qian, Hong
    Sun, Bo
    Guo, Yuanjun
    Yang, Zhile
    Ling, Jun
    Feng, Wei
    COMPUTERS & ELECTRICAL ENGINEERING, 2022, 99
  • [2] A scalable physician-level deep learning algorithm detects universal trauma on pelvic radiographs
    Chi-Tung Cheng
    Yirui Wang
    Huan-Wu Chen
    Po-Meng Hsiao
    Chun-Nan Yeh
    Chi-Hsun Hsieh
    Shun Miao
    Jing Xiao
    Chien-Hung Liao
    Le Lu
    Nature Communications, 12
  • [3] A scalable physician-level deep learning algorithm detects universal trauma on pelvic radiographs
    Cheng, Chi-Tung
    Wang, Yirui
    Chen, Huan-Wu
    Hsiao, Po-Meng
    Yeh, Chun-Nan
    Hsieh, Chi-Hsun
    Miao, Shun
    Xiao, Jing
    Liao, Chien-Hung
    Lu, Le
    NATURE COMMUNICATIONS, 2021, 12 (01)
  • [4] PARALLEL I/O OPTIMIZATIONS FOR SCALABLE DEEP LEARNING
    Pumma, Sarunya
    Si, Min
    Feng, Wu-chun
    Balaji, Pavan
    2017 IEEE 23RD INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2017, : 720 - 729
  • [5] Scalable deep learning for healthcare: methods and applications
    Barillaro, Luca
    Agapito, Giuseppe
    Cannataro, Mario
    13TH ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND HEALTH INFORMATICS, BCB 2022, 2022,
  • [6] Parallel Gym Gazebo: a Scalable Parallel Robot Deep Reinforcement Learning Platform
    Liang, Zhen
    Cai, Zhongxuan
    Li, Minglong
    Yang, Wenjing
    2019 IEEE 31ST INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2019), 2019, : 206 - 213
  • [7] An Allreduce Algorithm and Network Co-design for Large-Scale Training of Distributed Deep Learning
    Nguyen, Truong Thao
    Wahib, Mohamed
    21ST IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND INTERNET COMPUTING (CCGRID 2021), 2021, : 396 - 405
  • [8] Towards a Scalable and Distributed Infrastructure for Deep Learning Applications
    Hasheminezhad, Bita
    Shirzad, Shahrzad
    Wu, Nanmiao
    Diehl, Patrick
    Schulz, Hannes
    Kaiser, Hartmut
    PROCEEDINGS OF 2020 IEEE/ACM 5TH WORKSHOP ON DEEP LEARNING ON SUPERCOMPUTERS (DLS 2020), 2020, : 20 - 30
  • [9] Performance Issues of Parallel, Scalable Convolutional Neural Networks in Deep Learning
    Chavan, Umesh
    Kulkarni, Dinesh
    COMPUTING, COMMUNICATION AND SIGNAL PROCESSING, ICCASP 2018, 2019, 810 : 333 - 343
  • [10] Applications of Deep Learning in Satellite Communication: A Survey
    He, Yuanzhi
    Sheng, Biao
    Li, Yuan
    Wang, Changxu
    Chen, Xiang
    Liu, Jinchao
    SPACE INFORMATION NETWORKS, SINC 2023, 2024, 2057 : 17 - 33