Parallel prefix (scan) algorithms for MPI

被引:0
|
作者
Sanders, Peter
Traeff, Jesper Larsson
机构
[1] Univ Karlsruhe, D-76131 Karlsruhe, Germany
[2] NEC Europe Ltd, C&C Res Labs, D-53757 St Augustin, Germany
关键词
cluster of SMPs; collective communication; MPI implementation; prefix sum; pipelining;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We describe and experimentally compare four theoretically well-known algorithms for the parallel prefix operation (scan, in MPI terms), and give a presumably novel, doubly-pipelined implementation of the in-order binary tree parallel prefix algorithm. Bidirectional interconnects can benefit from this implementation. We present results from a 32 node AMD Cluster with Myrinet 2000 and a 72-node SX-8 parallel vector system. The doubly-pipelined algorithm is more than a factor two faster than the straight-forward binomial-tree algorithm found in many MPI implementations. However, due to its small constant factors the simple, linear pipeline algorithm is preferable for systems with a moderate number of processors. We also discuss adapting the algorithms to clusters of SMP nodes.
引用
下载
收藏
页码:49 / 57
页数:9
相关论文
共 50 条
  • [41] Performance of MPI broadcast algorithms
    Wadsworth, Daniel M.
    Chen, Zizhong
    2008 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-8, 2008, : 3049 - 3055
  • [42] Parallel programming with interoperable MPI
    George, WL
    Hagedorn, JG
    Devaney, JE
    DR DOBBS JOURNAL, 2004, 29 (02): : 49 - 53
  • [43] MPI-OpenMP Algorithms for the Parallel Space-Time Solution of Time Dependent PDEs
    Haynes, Ronald D.
    Ong, Benjamin W.
    DOMAIN DECOMPOSITION METHODS IN SCIENCE AND ENGINEERING XXI, 2014, 98 : 179 - 187
  • [44] INTERPRETIVE MPI FOR PARALLEL COMPUTING
    Chou, Yu-Cheng
    Cheng, Harry H.
    DETC 2008: PROCEEDINGS OF THE ASME INTERNATIONAL DESIGN ENGINEERING TECHNICAL CONFERENCES AND COMPUTERS AND INFORMATIONAL IN ENGINEERING CONFERENCE, VOL 3, PTS A AND B: 28TH COMPUTERS AND INFORMATION IN ENGINEERING CONFERENCE, 2009, : 1163 - 1176
  • [45] PARALLEL PREFIX COMPUTATION WITH FEW PROCESSORS
    EGECIOGLU, O
    KOC, CK
    COMPUTERS & MATHEMATICS WITH APPLICATIONS, 1992, 24 (04) : 77 - 84
  • [46] THE INSTABILITY OF PARALLEL PREFIX MATRIX MULTIPLICATION
    MATHIAS, R
    SIAM JOURNAL ON SCIENTIFIC COMPUTING, 1995, 16 (04): : 956 - 973
  • [47] Memory Debugging of MPI-Parallel Applications in Open MPI
    Keller, Rainer
    Fan, Shiqing
    Resch, Michael
    PARALLEL COMPUTING: ARCHITECTURES, ALGORITHMS AND APPLICATIONS, 2008, 15 : 517 - 523
  • [48] MPI + MPI: a new hybrid approach to parallel programming with MPI plus shared memory
    Torsten Hoefler
    James Dinan
    Darius Buntinas
    Pavan Balaji
    Brian Barrett
    Ron Brightwell
    William Gropp
    Vivek Kale
    Rajeev Thakur
    Computing, 2013, 95 : 1121 - 1136
  • [49] Parallel prefix computation on a pyramid computer
    Univ of Rome - La Sapienza, Rome, Italy
    Pattern Recognit Lett, 1 (19-22):
  • [50] LIMITED WIDTH PARALLEL PREFIX CIRCUITS
    CARLSON, DA
    SUGLA, B
    JOURNAL OF SUPERCOMPUTING, 1990, 4 (02): : 107 - 129