Scalability of parallel spatial direct numerical simulations on intel hypercube and IBM SP1 and SP2

被引:1
|
作者
Joslin, Ronald D. [1 ]
Hanebutte, Ulf R. [1 ]
Zubair, Mohammad [1 ]
机构
[1] NASA Langley Research Cent, Hampton, United States
来源
Journal of Scientific Computing | 1995年 / 10卷 / 02期
关键词
Boundary layer flow - Calculations - Codes (symbols) - Computation theory - Computer architecture - Data structures - Estimation - Fast Fourier transforms - Laminar flow - Optimization - Parallel processing systems - Turbulent flow;
D O I
暂无
中图分类号
学科分类号
摘要
The implementation and performance of a parallel spatial direct numerical simulation (PSDNS) approach on the Intel iPSC/860 hypercube and IBM SP1 and SP2 parallel computers is documented. Spatially evolving disturbances associated with laminar-to-turbulent transition in boundary-layer flows are computed with the PSDNS code. The feasibility of using the PSDNS to perform transition studies on these computers is examined. The results indicate that PSDNS approach can effectively be parallelized on a distributed-memory parallel machine by remapping the distributed data structure during the course of the calculation. Scalability information is provided to estimate computational costs to match the actual costs relative to changes in the number of grid points. By increasing the number of processors, slower than linear speedups are achieved with optimized (machine-dependent library) routines. This slower than linear speedup results because the computational cost is dominated by FFT routine, which yields less than ideal speedups. By using appropriate compile options and optimized library routines on the SP1, the serial code achieves 52-56 Mflops on a single node of the SP1 (45 percent of theoretical peak performance). The actual performance of the PSDNS code on the SP1 is evaluated with a 'real world' simulation that consists of 1.7 million grid points. One time step of this simulation is calculated on eight nodes of the SP1 in the same time as required by a Cray Y/MP supercomputer. For the same simulation, 32-nodes of the SP1 and SP2 are required to reach the performance of a Cray C-90. A 32 node SP1 (SP2) configuration is 2.9 (4.6) times faster than a Cray Y/MP for this simulation, while the hypercube is roughly 2 times slower than the Y/MP for this application.
引用
下载
收藏
页码:233 / 269
相关论文
共 50 条
  • [1] THE COMMUNICATION SOFTWARE AND PARALLEL ENVIRONMENT OF THE IBM SP2
    SNIR, M
    HOCHSCHILD, P
    FRYE, DD
    GILDEA, KJ
    IBM SYSTEMS JOURNAL, 1995, 34 (02) : 205 - 221
  • [2] Spherical functions on SP2 as a spherical homogeneous SP2 x (SP1)2-space
    Hironaka, Y
    JOURNAL OF NUMBER THEORY, 2005, 112 (02) : 238 - 286
  • [3] Benchmark evaluation of the IBM SP2 for parallel signal processing
    Hwang, K
    Xu, ZW
    Arakawa, M
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 1996, 7 (05) : 522 - 536
  • [4] Scalability analysis of large code using factorial designs on the IBM SP2
    Alabdulkareem, M
    Lakshmivarahan, S
    Dhall, SK
    INTERNATIONAL SOCIETY FOR COMPUTERS AND THEIR APPLICATIONS 10TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING SYSTEMS, 1997, : 574 - 577
  • [5] STEREOSPECIFICITY OF SP1 AND SP2 SUBSTANCE-P RECEPTORS
    PIERCEY, MF
    DOBRYSCHREUR, PJK
    MASIQUES, N
    SCHROEDER, LA
    LIFE SCIENCES, 1985, 36 (08) : 777 - 780
  • [7] Parallel sorting on the NEC Cenju-3 and IBM SP2
    Sanders, Darren
    Park, Yoonho
    Govindan, Vasudha
    IEEE, Los Alamitos, CA, United States
  • [8] Implementation of a parallel genetic algorithm for floorplan optimization on IBM SP2
    Foo, HY
    Song, JJ
    Zhuang, WJ
    Esbensen, H
    Kuh, ES
    HIGH PERFORMANCE COMPUTING ON THE INFORMATION SUPERHIGHWAY - HPC ASIA '97, PROCEEDINGS, 1997, : 456 - 459
  • [9] Parallel sorting on the NEC Cenju-3 and IBM SP2
    Sanders, D
    Park, Y
    Govindan, V
    HIGH PERFORMANCE COMPUTING ON THE INFORMATION SUPERHIGHWAY - HPC ASIA '97, PROCEEDINGS, 1997, : 214 - 219