Application-bypass reduction for large-scale clusters

被引:0
|
作者
Wagner, A [1 ]
Buntinas, D [1 ]
Panda, DK [1 ]
Brightwell, R [1 ]
机构
[1] Ohio State Univ, Dept Comp & Informat Sci, Network Based Comp Lab, Columbus, OH 43210 USA
关键词
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Process skew is an important factor in the performance of parallel applications,especially in large-scale clusters. Reduction is a common collective operation which, by its nature, introduces implicit synchronization between the processes involved in the communication and is therefore highly susceptible to performance degradation due to process skew. A collective operation with application-bypass does not require the application to block in order for the operation to make progress. Application-bypass collective operations are therefore highly tolerant of skew. In this paper we describe the design and implementation of an application-bypass version of the reduction operation in MPICH over GM. We evaluate our implementation on a 16-node cluster Under conditions of process skew we find a factor of improvement of tip to 3.3 for our application-bypass reduction versus the default MPICH implementation. In addition, we see that this factor of improvement increases with system size, indicating that the application-bypass implementation is more scalable and skew-tolerant than the default non-application-bypass version. This framework promises design and development of high-performance and scalable collective communication libraries for next-generation large-scale clusters.
引用
收藏
页码:404 / 411
页数:8
相关论文
共 50 条
  • [1] Predictive analysis of a hydrodynamics application on large-scale CMP clusters
    Davis, J. A.
    Mudalige, G. R.
    Hammond, S. D.
    Herdman, J. A.
    Miller, I.
    Jarvis, S. A.
    [J]. COMPUTER SCIENCE-RESEARCH AND DEVELOPMENT, 2011, 26 (3-4): : 175 - 185
  • [2] Clusters and large-scale structure
    Bahcall, NA
    [J]. SEVENTEENTH TEXAS SYMPOSIUM ON RELATIVISTIC ASTROPHYSICS AND COSMOLOGY, 1995, 759 : 636 - 649
  • [3] Application-bypass broadcast in MPICH over GM
    Buntinas, D
    Panda, DK
    Brightwell, R
    [J]. CCGRID 2003: 3RD IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER COMPUTING AND THE GRID, PROCEEDINGS, 2003, : 2 - 9
  • [4] A NEW APPLICATION OF BINGGELI TEST FOR LARGE-SCALE ALIGNMENT OF CLUSTERS OF GALAXIES
    STRUBLE, MF
    PEEBLES, PJE
    [J]. ASTRONOMICAL JOURNAL, 1985, 90 (04): : 582 - 589
  • [5] Large-scale simulations of clusters of galaxies
    Ricker, PM
    Calder, AC
    Dursi, LJ
    Fryxell, B
    Lamb, DQ
    MacNeice, P
    Olson, K
    Rosner, R
    Timmes, FX
    Truran, JW
    Tufo, HM
    Zingale, M
    [J]. ADVANCED COMPUTING AND ANALYSIS TECHNIQUES IN PHYSICS RESEARCH, 2001, 583 : 316 - 318
  • [6] Clusters as large-scale development facilities
    Evard, R
    Desai, N
    Navarro, JP
    Nurmi, D
    [J]. 2002 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING, PROCEEDINGS, 2002, : 54 - 63
  • [7] LARGE-SCALE DISTRIBUTION OF CLUSTERS OF GALAXIES
    SCHMIDT, KH
    [J]. ASTRONOMISCHE NACHRICHTEN, 1983, 304 (05) : 201 - 210
  • [8] Anomaly Localization in Large-Scale Clusters
    Zheng, Ziming
    Li, Yawei
    Lan, Zhiling
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING, 2007, : 322 - 330
  • [9] CUDA Kernel based Collective Reduction Operations on Large-scale GPU Clusters
    Chu, Ching-Hsiang
    Hamidouche, Khaled
    Venkatesh, Akshay
    Awan, Ammar Ahmad
    Panda, Dhabaleswar K.
    [J]. 2016 16TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2016, : 726 - 735
  • [10] LARGE-SCALE APPLICATION OF PRILLING
    ROBERTS, AG
    SHAH, KD
    [J]. CHEMICAL ENGINEER-LONDON, 1975, (304): : 748 - 750