Parallel prefix (scan) algorithms for MPI

被引：0

作者：

Sanders, Peter

Traeff, Jesper Larsson

机构：

[1] Univ Karlsruhe, D-76131 Karlsruhe, Germany

[2] NEC Europe Ltd, C&C Res Labs, D-53757 St Augustin, Germany

来源：

RECENT ADVANCES IN PARALLEL VIRTUAL MACHINE AND MESSAGE PASSING INTERFACE | 2006年 / 4192卷

关键词：

cluster of SMPs; collective communication; MPI implementation; prefix sum; pipelining;

D O I：

暂无

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

We describe and experimentally compare four theoretically well-known algorithms for the parallel prefix operation (scan, in MPI terms), and give a presumably novel, doubly-pipelined implementation of the in-order binary tree parallel prefix algorithm. Bidirectional interconnects can benefit from this implementation. We present results from a 32 node AMD Cluster with Myrinet 2000 and a 72-node SX-8 parallel vector system. The doubly-pipelined algorithm is more than a factor two faster than the straight-forward binomial-tree algorithm found in many MPI implementations. However, due to its small constant factors the simple, linear pipeline algorithm is preferable for systems with a moderate number of processors. We also discuss adapting the algorithms to clusters of SMP nodes.

引用

下载

页码：49 / 57

页数：9

共 50 条

[41] Performance of MPI broadcast algorithms
Wadsworth, Daniel M.
Chen, Zizhong
2008 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-8, 2008, : 3049 - 3055
[42] Parallel programming with interoperable MPI
George, WL
Hagedorn, JG
Devaney, JE
DR DOBBS JOURNAL, 2004, 29 (02): : 49 - 53
[43] MPI-OpenMP Algorithms for the Parallel Space-Time Solution of Time Dependent PDEs
Haynes, Ronald D.
Ong, Benjamin W.
DOMAIN DECOMPOSITION METHODS IN SCIENCE AND ENGINEERING XXI, 2014, 98 : 179 - 187
[44] INTERPRETIVE MPI FOR PARALLEL COMPUTING
Chou, Yu-Cheng
Cheng, Harry H.
DETC 2008: PROCEEDINGS OF THE ASME INTERNATIONAL DESIGN ENGINEERING TECHNICAL CONFERENCES AND COMPUTERS AND INFORMATIONAL IN ENGINEERING CONFERENCE, VOL 3, PTS A AND B: 28TH COMPUTERS AND INFORMATION IN ENGINEERING CONFERENCE, 2009, : 1163 - 1176
[45] PARALLEL PREFIX COMPUTATION WITH FEW PROCESSORS
EGECIOGLU, O
KOC, CK
COMPUTERS & MATHEMATICS WITH APPLICATIONS, 1992, 24 (04) : 77 - 84
[46] THE INSTABILITY OF PARALLEL PREFIX MATRIX MULTIPLICATION
MATHIAS, R
SIAM JOURNAL ON SCIENTIFIC COMPUTING, 1995, 16 (04): : 956 - 973
[47] Memory Debugging of MPI-Parallel Applications in Open MPI
Keller, Rainer
Fan, Shiqing
Resch, Michael
PARALLEL COMPUTING: ARCHITECTURES, ALGORITHMS AND APPLICATIONS, 2008, 15 : 517 - 523
[48] MPI + MPI: a new hybrid approach to parallel programming with MPI plus shared memory
Torsten Hoefler
James Dinan
Darius Buntinas
Pavan Balaji
Brian Barrett
Ron Brightwell
William Gropp
Vivek Kale
Rajeev Thakur
Computing, 2013, 95 : 1121 - 1136
[49] Parallel prefix computation on a pyramid computer
Univ of Rome - La Sapienza, Rome, Italy
Pattern Recognit Lett, 1 (19-22):
[50] LIMITED WIDTH PARALLEL PREFIX CIRCUITS
CARLSON, DA
SUGLA, B
JOURNAL OF SUPERCOMPUTING, 1990, 4 (02): : 107 - 129

← 1 2 3 4 5 →