Large-scale parallelization based on CPU and GPU cluster for cosmological fluid simulations

被引:4
|
作者
Meng, Chen [1 ,2 ]
Wang, Long [1 ]
Cao, Zongyan [1 ,3 ]
Feng, Long-long [4 ]
Zhu, Weishan [4 ]
机构
[1] Chinese Acad Sci, Supercomp Ctr, Comp Network Informat Ctr, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci, Beijing 100049, Peoples R China
[3] Chinese Acad Sci, Natl Astron Observ, Beijing 100012, Peoples R China
[4] Chinese Acad Sci, Purple Mt Observ, Nanjing 210008, Jiangsu, Peoples R China
关键词
Cosmological hydrodynamics; WENO; GPU; Hierarchical memory; Heterogeneous; Large-scale;
D O I
10.1016/j.compfluid.2014.04.006
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
We present our parallel implementation for large-scale cosmological simulations of 3D supersonic fluids based on CPU and GPU clusters. Our developments are based on a CPU code named WIGEON. It is shown that, compared to the original sequential Fortran code, a speedup of 19-31 (depending on the specific GPU card) can be achieved on single GPU. Furthermore, our results show that the pure MPI parallelization scales very well up to 10 thousand CPU cores. In addition, a hybrid CPU/GPU parallelization scheme is introduced and a detailed analysis of the speedup and the scaling on the different number of CPU/GPU units are presented (up to 256 GPU cards due to computing resource limitation). Our high scalability and speedup rely on the domain decomposition approach, optimization of the algorithm and a series of techniques to optimize the CUDA implementation, especially in the memory access pattern on CPU. We believe this hybrid MPI + CUDA code can be an excellent candidate for 10 Peta-scale computing and beyond. (C) 2014 Elsevier Ltd. All rights reserved.
引用
收藏
页码:152 / 158
页数:7
相关论文
共 50 条
  • [21] Optimizing the LINPACK Algorithm for Large-Scale PCIe-Based CPU-GPU Heterogeneous Systems
    Tan, Guangming
    Shui, Chaoyang
    Wang, Yinshan
    Yu, Xianzhi
    Yan, Yujin
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2021, 32 (09) : 2367 - 2380
  • [22] A GPU-based Framework for Large-scale Multi-Agent Traffic Simulations
    Sano, Yoshihito
    Fukuta, Naoki
    2013 SECOND IIAI INTERNATIONAL CONFERENCE ON ADVANCED APPLIED INFORMATICS (IIAI-AAI 2013), 2013, : 262 - 267
  • [23] Collaborating CPU and GPU for large-scale high-order CFD simulations with complex grids on the TianHe-1A supercomputer
    Xu, Chuanfu
    Deng, Xiaogang
    Zhang, Lilun
    Fang, Jianbin
    Wang, Guangxue
    Jiang, Yi
    Cao, Wei
    Che, Yonggang
    Wang, Yongxian
    Wang, Zhenghua
    Liu, Wei
    Cheng, Xinghua
    JOURNAL OF COMPUTATIONAL PHYSICS, 2014, 278 : 275 - 297
  • [24] Parallelization of large vector similarity computations in a hybrid CPU+GPU environment
    Paweł Czarnul
    The Journal of Supercomputing, 2018, 74 : 768 - 786
  • [25] Parallelization of large vector similarity computations in a hybrid CPU plus GPU environment
    Czarnul, Pawe
    JOURNAL OF SUPERCOMPUTING, 2018, 74 (02): : 768 - 786
  • [26] Performance Engineering of the Kernel Polynomial Method on Large-Scale CPU-GPU Systems
    Kreutzer, Moritz
    Hager, Georg
    Wellein, Gerhard
    Pieper, Andreas
    Alvermann, Andreas
    Fehske, Holger
    2015 IEEE 29TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2015, : 417 - 426
  • [27] A novel CPU/GPU simulation environment for large-scale biologically realistic neural modeling
    Hoang, Roger V.
    Tanna, Devyani
    Bray, Laurence C. Jayet
    Dascalu, Sergiu M.
    Harris, Frederick C., Jr.
    FRONTIERS IN NEUROINFORMATICS, 2013, 7
  • [28] A Distributed CPU-GPU Framework for Pairwise Alignments on Large-Scale Sequence Datasets
    Li, Da
    Sajjapongse, Kittisak
    Huan Truong
    Conant, Gavin
    Becchi, Michela
    PROCEEDINGS OF THE 2013 IEEE 24TH INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS (ASAP 13), 2013, : 329 - 338
  • [29] Acceleration of Large-Scale FDTD Simulations on High Performance GPU Clusters
    Ong, C.
    Weldon, M.
    Cyca, D.
    Okoniewski, M.
    2009 IEEE ANTENNAS AND PROPAGATION SOCIETY INTERNATIONAL SYMPOSIUM AND USNC/URSI NATIONAL RADIO SCIENCE MEETING, VOLS 1-6, 2009, : 545 - 548
  • [30] Stochastic configuration networks with CPU-GPU implementation for large-scale data analytics
    Li J.
    Wang D.
    Information Sciences, 2024, 667