A Bandwidth-saving Optimization for MPI Broadcast Collective Operation

被引:7
|
作者
Zhou, Huan [1 ]
Marjanovic, Vladimir [1 ]
Niethammer, Christoph [1 ]
Gracia, Jose [1 ]
机构
[1] Univ Stuttgart, High Performance Comp Ctr Stuttgart HLRS, D-70174 Stuttgart, Germany
关键词
MPICH; Broadcast; Bandwidth-saving;
D O I
10.1109/ICPPW.2015.20
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The efficiency and scalability of MPI collective operations, in particular the broadcast operation, plays an integral part in high performance computing applications. MPICH, as one of the contemporary widely-used MPI software stacks, implements the broadcast operation based on point-to-point operation. Depending on the parameters, such as message size and process count, the library chooses to use different algorithms, as for instance binomial dissemination, recursive-doubling exchange or ring all-to-all broadcast (allgather). However, the existing broadcast design in latest release of MPICH does not provide good performance for large messages (lmsg) or medium messages with non-power-of-two process counts (mmsg-npof2) due to the inner suboptimal ring allgather algorithm. In this paper, based on the native broadcast design in MPICH, we propose a tuned broadcast approach with bandwidth-saving in mind catering to the case of lmsg and mmsg-npof2. Several comparisons of the native and tuned broadcast designs are made for different data sizes and program sizes on Cray XC40 cluster. The results show that the performance of the tuned broadcast design can get improved by a range from 2% to 54% for lmsg and mmsg-npof2 in terms of user-level testing.
引用
收藏
页码:111 / 118
页数:8
相关论文
共 50 条
  • [1] Path Based Optimization of MPI Collective Communication Operation in Cloud
    Sudhakar, Chapram
    Ramesh, T.
    Waghmare, Kunal
    [J]. 2018 INTERNATIONAL CONFERENCE ON COMPUTING, POWER AND COMMUNICATION TECHNOLOGIES (GUCON), 2018, : 595 - 599
  • [2] A Mobile 3-D Display Processor With A Bandwidth-Saving Subdivider
    Kim, Seok-Hoon
    Yoon, Sung-Eui
    Chung, Sang-Hye
    Kim, Young-Jun
    Kim, Hong-Yun
    Chung, Kyusik
    Kim, Lee-Sup
    [J]. IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2012, 20 (06) : 1082 - 1093
  • [3] A Survey of User Behavior in VoD Service and Bandwidth-Saving Multicast Streaming Schemes
    Choi, Joonho
    Reaz, Abu
    Mukherjee, Biswanath
    [J]. IEEE COMMUNICATIONS SURVEYS AND TUTORIALS, 2012, 14 (01): : 156 - 169
  • [4] Energy/bandwidth-Saving Cooperative Spectrum Sensing for Two-hop WRAN
    Zhou, Ming-Tuo
    Song, Chunyi
    Sum, Chin Sean
    Harada, Hiroshi
    [J]. 2014 9TH INTERNATIONAL CONFERENCE ON COGNITIVE RADIO ORIENTED WIRELESS NETWORKS AND COMMUNICATIONS (CROWNCOM), 2014, : 51 - 56
  • [5] Detection of collective MPI operation patterns
    Knüpfer, A
    Kranzlmüller, D
    Nagel, WE
    [J]. RECENT ADVANCES IN PARALLEL VIRTUAL MACHINE AND MESSAGE PASSING INTERFACE, PROCEEDINGS, 2004, 3241 : 259 - 267
  • [6] MDTK: Bandwidth-Saving Framework for Distributed Top-k Similar Trajectory Query
    Zhang, Zhigang
    Mao, Jiali
    Jin, Cheqing
    Zhou, Aoying
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2018, PT I, 2018, 10827 : 613 - 629
  • [7] Dynamic optimization of load balance in MPI broadcast
    Soga, Takesi
    Kurihara, Kouji
    Nanri, Takeshi
    Kurokawa, Motoyoshi
    Murakami, Kazuaki
    [J]. RECENT ADVANCES IN PARALLEL VIRTUAL MACHINE AND MESSAGE PASSING INTERFACE, 2007, 4757 : 387 - +
  • [8] TAMED FREQUENCY-MODULATION - BANDWIDTH-SAVING DIGITAL MODULATION METHOD, SUITED FOR MOBILE RADIO
    MUILWIJK, D
    [J]. PHILIPS TELECOMMUNICATION REVIEW, 1979, 37 (01): : 35 - 49
  • [9] Low Power Optimization for MPI Collective Operations
    Dong, Yong
    Chen, Juan
    Yang, Xuejun
    Yang, Canqun
    Peng, Lin
    [J]. PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE FOR YOUNG COMPUTER SCIENTISTS, VOLS 1-5, 2008, : 1047 - 1052
  • [10] Software Defined Multicasting for MPI Collective Operation Offloading with the NetFPGA
    Arap, Omer
    Brown, Geoffrey
    Himebaugh, Bryce
    Swany, Martin
    [J]. EURO-PAR 2014 PARALLEL PROCESSING, 2014, 8632 : 632 - 643