Towards Efficient Decomposition and Parallelization of MPDATA on Hybrid CPU-GPU Cluster

被引:10
|
作者
Wyrzykowski, Roman [1 ]
Szustak, Lukasz [1 ]
Rojek, Krzysztof [1 ]
Tomas, Adam [1 ]
机构
[1] Czestochowa Tech Univ, PL-42201 Czestochowa, Poland
关键词
D O I
10.1007/978-3-662-43880-0_52
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
EULAG (Eulerian/semi-Lagrangian fluid solver) is an established computational model for simulating thermo-fluid flows across a wide range of scales and physical scenarios. The multidimensional positive definite advection transport algorithm (MPDATA) is among the most time-consuming components of EULAG. New supercomputing architectures based on multi-and many-core processors, such as hybrid CPU-GPU platforms, offer notable advantages over traditional supercomputers. In our previous works we considered adaptation of 2-dimensional (2D) MPDATA computations to a single CPU-GPU node. The main goal of this paper is to study tenets of optimal parallel formulation of 3D MPDATA on heterogeneous CPU-GPU cluster. Such supercomputer architecture requires not only a different philosophy of memory management than traditional massively parallel supercomputers, but also a comprehensive look at load balancing in the heterogeneous co-processing computing model. In this paper we propose an approach to implementation of 3D MPDATA algorithm on hybrid CPU-GPU cluster, using a mixture of MPI, OpenMP, and CUDA programming standards. This approach focuses on the donor-cell numerical scheme, and is based on a hierarchical decomposition including level of cluster, as well as distribution of computations between CPU and GPU components of each node, and within CPU and GPU devices. We discuss preliminary performance results for the proposed approach running on a single cluster node consisting of two AMD Opteron Interlagos CPUs and one or two NVIDIA Fermi GPUs.
引用
收藏
页码:457 / 464
页数:8
相关论文
共 50 条
  • [31] Multireference coupled cluster methods on heterogeneous CPU-GPU systems
    Bhaskaran-Nair, Kiran
    Ma, Wenjing
    Krishnamoorthy, Sriram
    Villa, Oreste
    van Dam, Hubertus J. J.
    Apra, Edoardo
    Kowalski, Karol
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2013, 246
  • [32] Poet: A Power Efficient Hybrid Optical NoC Topology for Heterogeneous CPU-GPU Systems
    Cheng, Tao
    Wu, Ning
    Yan, Gaizhen
    Zhang, Xinggan
    Zhang, Xiaoqiang
    45TH ANNUAL CONFERENCE OF THE IEEE INDUSTRIAL ELECTRONICS SOCIETY (IECON 2019), 2019, : 3091 - 3095
  • [33] An efficient CPU-GPU hybrid parallel implementation for DVB-RCS2 receiver
    Wang, Yueqing
    Wang, Fang
    Li, Rongchun
    Dou, Yong
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2018, 30 (19):
  • [34] The Unicorn Runtime: Efficient Distributed Shared Memory Programming for Hybrid CPU-GPU Clusters
    Beri, Tarun
    Bansal, Sorav
    Kumar, Subodh
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (05) : 1518 - 1534
  • [35] Porting a Neuro-Imaging Application to a CPU-GPU cluster
    Nakhjavani, Reza Sina
    Sharify, Sahel
    Hashemi, Ali B.
    Lu, Alan W.
    Amza, Cristiana
    Strother, Stephen
    2014 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS), 2014, : 137 - 145
  • [36] Towards Optimal Fast Matrix Multiplication on CPU-GPU Platforms
    Shao, Senhao
    Wang, Yizhuo
    Ji, Weixing
    Gao, Jianhua
    PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES, PDCAT 2021, 2022, 13148 : 223 - 236
  • [37] A hybrid CPU-GPU paradigm to accelerate reactive CFD simulations
    Ghioldi, Federico
    Piscaglia, Federico
    INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN FLUIDS, 2024, 96 (08) : 1461 - 1488
  • [38] Hybrid CPU-GPU Computation of Adjoint Derivatives in Time Domain
    Statz, Christoph
    Muetze, Marco
    Hegler, Sebastian
    Plettemeier, Dirk
    2013 COMPUTATIONAL ELECTROMAGNETICS WORKSHOP (CEM'13), 2013, : 32 - 33
  • [39] Optimizing tensor contraction expressions for hybrid CPU-GPU execution
    Wenjing Ma
    Sriram Krishnamoorthy
    Oreste Villa
    Karol Kowalski
    Gagan Agrawal
    Cluster Computing, 2013, 16 : 131 - 155
  • [40] Parallelization of the k-means Algorithm in a Spectral Clustering Chain on CPU-GPU Platforms
    He, Guanlin
    Vialle, Stephane
    Baboulin, Marc
    EURO-PAR 2020: PARALLEL PROCESSING WORKSHOPS, 2021, 12480 : 135 - 147