Optimizing a conjugate gradient solver with non-blocking collective operations

被引：0

作者：

Hoefler, Torsten ^{[1
]}

Gottschling, Peter

Rehm, Wolfgang

Lumsdaine, Andrew

机构：

[1] Indiana Univ, Open Syst Lab, Bloomington, IN 47404 USA

[2] Tech Univ Chemnitz, Dept Comp Sci, D-09107 Chemnitz, Germany

来源：

RECENT ADVANCES IN PARALLEL VIRTUAL MACHINE AND MESSAGE PASSING INTERFACE | 2006年 / 4192卷

关键词：

D O I：

暂无

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

This paper presents a case study about the applicability and usage of non-blocking collective operations. These operations provide the ability to overlap communication with computation and to avoid unnecessary synchronization. We introduce our NBC library, a portable low-overhead implementation of non-blocking collectives on top of MPI-1. We demonstrate the easy usage of the NBC library with the optimization of a conjugate gradient solver with only minor changes to the traditional parallel implementation of the program. The optimized solver runs up to 34% faster and is able to overlap most of the communication. We show that there is, due to the overlap, no performance difference between Gi-gabit Ethernet and InfiniBand (TM) for our calculation.

引用

页码：374 / 382

页数：9

共 50 条

[1] Optimizing a conjugate gradient solver with non-blocking collective operations
Hoefler, Torsten
Gottschling, Peter
Lumsdaine, Andrew
Rehm, Wolfgang
PARALLEL COMPUTING, 2007, 33 (09) : 624 - 633
[2] Optimizing non-blocking collective operations for InfiniBand
Hoefler, Torsten
Lumsdaine, Andrew
2008 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-8, 2008, : 182 - +
[3] A case for non-blocking collective operations
Hoefler, Torsten
Squyres, Jeffrey M.
Rehm, Wolfgang
Lumsdaine, Andrew
FRONTIERS OF HIGH PERFORMANCE COMPUTING AND NETWORKING - ISPA 2006 WORKSHOPS, PROCEEDINGS, 2006, 4331 : 155 - +
[4] A case for standard non-blocking collective operations
Hoefler, Torsten
Kambadur, Prabhanjan
Graham, Richard L.
Shipman, Galen
Lumsdaine, Andrew
RECENT ADVANCES IN PARALLEL VIRTUAL MACHINE AND MESSAGE PASSING INTERFACE, 2007, 4757 : 125 - +
[5] Scalable Non-blocking Preconditioned Conjugate Gradient Methods
Eller, Paul R.
Gropp, William
SC '16: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2016, : 204 - 215
[6] Auto-tuning Non-blocking Collective Communication Operations
Barigou, Youcef
Venkatesan, Vishwanath
Gabriel, Edgar
2015 IEEE 29TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, 2015, : 1204 - 1213
[7] Implementation and Performance Analysis of Non-Blocking Collective Operations for MPI
Hoefler, Torsten
Lumsdaine, Andrew
Rehm, Wolfgang
2007 ACM/IEEE SC07 CONFERENCE, 2010, : 127 - +
[8] Designing Non-blocking Allreduce with Collective Offload on InfiniBand Clusters: A Case Study with Conjugate Gradient Solvers
Kandalla, K.
Yang, U.
Keasler, J.
Kolev, T.
Moody, A.
Subramoni, H.
Tomko, K.
Vienne, J.
de Supinski, B. R.
Panda, D. K.
2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2012, : 1156 - 1167
[9] Optimizing a parallel conjugate gradient solver
Field, MR
SIAM JOURNAL ON SCIENTIFIC COMPUTING, 1998, 19 (01): : 27 - 37
[10] Progression of MPI non-blocking collective operations using Hyper-Threading
Miwa, Masahiro
Nakashima, Kohta
23RD EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING (PDP 2015), 2015, : 163 - 171

← 1 2 3 4 5 →