PERFORMANCE ANALYSIS AND OPTIMIZATION OF PARALLEL SCIENTIFIC APPLICATIONS ON CMP CLUSTERS

被引:0
|
作者
Wu, Xingfu [1 ]
Taylor, Valerie [1 ]
Lively, Charles [1 ]
Sharkawi, Sameh [1 ]
机构
[1] Texas A&M Univ, Dept Comp Sci, College Stn, TX 77843 USA
来源
基金
美国国家科学基金会;
关键词
performance analysis; performance optimization; chip multiprocessors (CMP); clusters; parallel scientific applications;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Chip multiprocessors (CMP) are widely used for high performance computing. Further, these CMPs are being configured in a hierarchical manner to compose a node in a cluster system. A major challenge to be addressed is efficient use of such cluster systems for large-scale scientific applications. In this paper, we quantify the performance gap resulting from using different number of processors per node; this information is used to provide a baseline for the amount of optimization needed when using all processors per node on CMP clusters. We conduct detailed performance analysis to identify how applications can be modified to efficiently utilize all processors per node using three scientific applications: a 3D particle-in-cell, magnetic fusion application Gyrokinetic Toroidal Code (GTC), a Lattice Boltzmann Method for simulating fluid dynamics (LBM), and an advanced Eulerian gyrokinetic-Maxwell equation solver for simulating microturbulent transport in plasma (GYRO). In terms of refinements, we use conventional techniques such as loop blocking, loop unrolling and loop fusion, and develop hybrid methods for optimizing MPI-Allreduce and MPI Reduce. Using these optimizations, the application performance for utilizing all processors per node was improved by up to 18.97% for GTC, 15.77% for LBM and 12.29% for GYRO on up to 2048 total processors on the CMP clusters.
引用
收藏
页码:61 / 74
页数:14
相关论文
共 50 条
  • [41] Scalability of parallel scientific applications on the cloud
    Srirama, Satish Narayana
    Batrashev, Oleg
    Jakovits, Pelle
    Vainikko, Eero
    [J]. SCIENTIFIC PROGRAMMING, 2011, 19 (2-3) : 91 - 105
  • [42] Observation and analysis of the multicore performance impact on scientific applications
    Simon, Tyler A.
    McGalliard, James
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2009, 21 (17): : 2213 - 2231
  • [43] Performance Modeling of scientific applications: Scalability analysis of LAPWO
    Fahringer, T
    Mazzocca, N
    Rak, M
    Pilana, S
    Villano, U
    Madsen, G
    [J]. ELEVENTH EUROMICRO CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING, PROCEEDINGS, 2003, : 5 - 12
  • [44] OPTIMIZATION AND PERFORMANCE ANALYSIS OF THINNING ALGORITHMS ON PARALLEL COMPUTERS
    HEYDORN, S
    WEIDNER, P
    [J]. PARALLEL COMPUTING, 1991, 17 (01) : 17 - 27
  • [45] Optimization of Infiniband for scientific applications
    Johnson, Gregory
    Kerbyson, Darren J.
    Lang, Mike
    [J]. 2008 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-8, 2008, : 2174 - 2181
  • [46] A methodology towards automatic performance analysis of parallel applications
    Calzarossa, M
    Massari, L
    Tessera, D
    [J]. PARALLEL COMPUTING, 2004, 30 (02) : 211 - 223
  • [47] Interactive debugging and performance analysis of massively parallel applications
    Inst fuer Informatik der Technischen, Universitaet Muenchen, Muenchen, Germany
    [J]. Parallel Comput, 3 (415-442):
  • [48] Performance analysis environment for parallel applications on networked workstations
    Bubak, M
    Funika, W
    Moscinski, J
    [J]. HIGH-PERFORMANCE COMPUTING AND NETWORKING, 1997, 1225 : 1002 - 1005
  • [49] Interactive debugging and performance analysis of massively parallel applications
    Wismuller, R
    Oberhuber, M
    Krammer, J
    Hansen, O
    [J]. PARALLEL COMPUTING, 1996, 22 (03) : 415 - 442
  • [50] SMS -: Tool for development and performance analysis of parallel applications
    Sandri, AL
    Gonçalves, RAL
    Martini, JA
    [J]. 37TH ANNUAL SIMULATION SYMPOSIUM, PROCEEDINGS, 2004, : 196 - 202