Towards Efficient Decomposition and Parallelization of MPDATA on Hybrid CPU-GPU Cluster

被引：10

作者：

Wyrzykowski, Roman ^{[1
]}

Szustak, Lukasz ^{[1
]}

Rojek, Krzysztof ^{[1
]}

Tomas, Adam ^{[1
]}

机构：

[1] Czestochowa Tech Univ, PL-42201 Czestochowa, Poland

来源：

LARGE-SCALE SCIENTIFIC COMPUTING, LSSC 2013 | 2014年 / 8353卷

关键词：

D O I：

10.1007/978-3-662-43880-0_52

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

EULAG (Eulerian/semi-Lagrangian fluid solver) is an established computational model for simulating thermo-fluid flows across a wide range of scales and physical scenarios. The multidimensional positive definite advection transport algorithm (MPDATA) is among the most time-consuming components of EULAG. New supercomputing architectures based on multi-and many-core processors, such as hybrid CPU-GPU platforms, offer notable advantages over traditional supercomputers. In our previous works we considered adaptation of 2-dimensional (2D) MPDATA computations to a single CPU-GPU node. The main goal of this paper is to study tenets of optimal parallel formulation of 3D MPDATA on heterogeneous CPU-GPU cluster. Such supercomputer architecture requires not only a different philosophy of memory management than traditional massively parallel supercomputers, but also a comprehensive look at load balancing in the heterogeneous co-processing computing model. In this paper we propose an approach to implementation of 3D MPDATA algorithm on hybrid CPU-GPU cluster, using a mixture of MPI, OpenMP, and CUDA programming standards. This approach focuses on the donor-cell numerical scheme, and is based on a hierarchical decomposition including level of cluster, as well as distribution of computations between CPU and GPU components of each node, and within CPU and GPU devices. We discuss preliminary performance results for the proposed approach running on a single cluster node consisting of two AMD Opteron Interlagos CPUs and one or two NVIDIA Fermi GPUs.

引用

页码：457 / 464

页数：8

共 50 条

[31] Multireference coupled cluster methods on heterogeneous CPU-GPU systems
Bhaskaran-Nair, Kiran
Ma, Wenjing
Krishnamoorthy, Sriram
Villa, Oreste
van Dam, Hubertus J. J.
Apra, Edoardo
Kowalski, Karol
ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2013, 246
[32] Poet: A Power Efficient Hybrid Optical NoC Topology for Heterogeneous CPU-GPU Systems
Cheng, Tao
Wu, Ning
Yan, Gaizhen
Zhang, Xinggan
Zhang, Xiaoqiang
45TH ANNUAL CONFERENCE OF THE IEEE INDUSTRIAL ELECTRONICS SOCIETY (IECON 2019), 2019, : 3091 - 3095
[33] An efficient CPU-GPU hybrid parallel implementation for DVB-RCS2 receiver
Wang, Yueqing
Wang, Fang
Li, Rongchun
Dou, Yong
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2018, 30 (19):
[34] The Unicorn Runtime: Efficient Distributed Shared Memory Programming for Hybrid CPU-GPU Clusters
Beri, Tarun
Bansal, Sorav
Kumar, Subodh
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (05) : 1518 - 1534
[35] Porting a Neuro-Imaging Application to a CPU-GPU cluster
Nakhjavani, Reza Sina
Sharify, Sahel
Hashemi, Ali B.
Lu, Alan W.
Amza, Cristiana
Strother, Stephen
2014 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS), 2014, : 137 - 145
[36] Towards Optimal Fast Matrix Multiplication on CPU-GPU Platforms
Shao, Senhao
Wang, Yizhuo
Ji, Weixing
Gao, Jianhua
PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES, PDCAT 2021, 2022, 13148 : 223 - 236
[37] A hybrid CPU-GPU paradigm to accelerate reactive CFD simulations
Ghioldi, Federico
Piscaglia, Federico
INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN FLUIDS, 2024, 96 (08) : 1461 - 1488
[38] Hybrid CPU-GPU Computation of Adjoint Derivatives in Time Domain
Statz, Christoph
Muetze, Marco
Hegler, Sebastian
Plettemeier, Dirk
2013 COMPUTATIONAL ELECTROMAGNETICS WORKSHOP (CEM'13), 2013, : 32 - 33
[39] Optimizing tensor contraction expressions for hybrid CPU-GPU execution
Wenjing Ma
Sriram Krishnamoorthy
Oreste Villa
Karol Kowalski
Gagan Agrawal
Cluster Computing, 2013, 16 : 131 - 155
[40] Parallelization of the k-means Algorithm in a Spectral Clustering Chain on CPU-GPU Platforms
He, Guanlin
Vialle, Stephane
Baboulin, Marc
EURO-PAR 2020: PARALLEL PROCESSING WORKSHOPS, 2021, 12480 : 135 - 147

← 1 2 3 4 5 →