Optimizing a conjugate gradient solver with non-blocking collective operations

被引:0
|
作者
Hoefler, Torsten [1 ]
Gottschling, Peter
Rehm, Wolfgang
Lumsdaine, Andrew
机构
[1] Indiana Univ, Open Syst Lab, Bloomington, IN 47404 USA
[2] Tech Univ Chemnitz, Dept Comp Sci, D-09107 Chemnitz, Germany
关键词
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
This paper presents a case study about the applicability and usage of non-blocking collective operations. These operations provide the ability to overlap communication with computation and to avoid unnecessary synchronization. We introduce our NBC library, a portable low-overhead implementation of non-blocking collectives on top of MPI-1. We demonstrate the easy usage of the NBC library with the optimization of a conjugate gradient solver with only minor changes to the traditional parallel implementation of the program. The optimized solver runs up to 34% faster and is able to overlap most of the communication. We show that there is, due to the overlap, no performance difference between Gi-gabit Ethernet and InfiniBand (TM) for our calculation.
引用
收藏
页码:374 / 382
页数:9
相关论文
共 50 条
  • [1] Optimizing a conjugate gradient solver with non-blocking collective operations
    Hoefler, Torsten
    Gottschling, Peter
    Lumsdaine, Andrew
    Rehm, Wolfgang
    PARALLEL COMPUTING, 2007, 33 (09) : 624 - 633
  • [2] Optimizing non-blocking collective operations for InfiniBand
    Hoefler, Torsten
    Lumsdaine, Andrew
    2008 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-8, 2008, : 182 - +
  • [3] A case for non-blocking collective operations
    Hoefler, Torsten
    Squyres, Jeffrey M.
    Rehm, Wolfgang
    Lumsdaine, Andrew
    FRONTIERS OF HIGH PERFORMANCE COMPUTING AND NETWORKING - ISPA 2006 WORKSHOPS, PROCEEDINGS, 2006, 4331 : 155 - +
  • [4] A case for standard non-blocking collective operations
    Hoefler, Torsten
    Kambadur, Prabhanjan
    Graham, Richard L.
    Shipman, Galen
    Lumsdaine, Andrew
    RECENT ADVANCES IN PARALLEL VIRTUAL MACHINE AND MESSAGE PASSING INTERFACE, 2007, 4757 : 125 - +
  • [5] Scalable Non-blocking Preconditioned Conjugate Gradient Methods
    Eller, Paul R.
    Gropp, William
    SC '16: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2016, : 204 - 215
  • [6] Auto-tuning Non-blocking Collective Communication Operations
    Barigou, Youcef
    Venkatesan, Vishwanath
    Gabriel, Edgar
    2015 IEEE 29TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, 2015, : 1204 - 1213
  • [7] Implementation and Performance Analysis of Non-Blocking Collective Operations for MPI
    Hoefler, Torsten
    Lumsdaine, Andrew
    Rehm, Wolfgang
    2007 ACM/IEEE SC07 CONFERENCE, 2010, : 127 - +
  • [8] Designing Non-blocking Allreduce with Collective Offload on InfiniBand Clusters: A Case Study with Conjugate Gradient Solvers
    Kandalla, K.
    Yang, U.
    Keasler, J.
    Kolev, T.
    Moody, A.
    Subramoni, H.
    Tomko, K.
    Vienne, J.
    de Supinski, B. R.
    Panda, D. K.
    2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2012, : 1156 - 1167
  • [9] Optimizing a parallel conjugate gradient solver
    Field, MR
    SIAM JOURNAL ON SCIENTIFIC COMPUTING, 1998, 19 (01): : 27 - 37
  • [10] Progression of MPI non-blocking collective operations using Hyper-Threading
    Miwa, Masahiro
    Nakashima, Kohta
    23RD EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING (PDP 2015), 2015, : 163 - 171