A hierarchical parallel implementation for heterogeneous computing. Application to algebra-based CFD simulations on hybrid supercomputers

被引：13

作者：

Alvarez-Farre, Xavier ^{[1
]}

Gorobets, Andrey ^{[2
]}

Trias, F. Xavier ^{[1
]}

机构：

[1] Tech Univ Catalonia, Heat & Mass Transfer Technol Ctr, Carrer Colom 11, Terrassa 08222, Barcelona, Spain

[2] Russian Acad Sci, Keldysh Inst Appl Math, Miusskaya Sq 4, Moscow 125047, Russia

来源：

COMPUTERS & FLUIDS | 2021年 / 214卷

基金：

俄罗斯科学基金会;

关键词：

Parallel CFD; SpMV; Heterogeneous computing; Hybrid supercomputer; CPU plus GPU; MPI plus OpenMP plus OpenCL; CUDA;

D O I：

10.1016/j.compfluid.2020.104768

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

The quest for new portable implementations of simulation algorithms is motivated by the increasing variety of computing architectures. Moreover, the hybridization of high-performance computing systems imposes additional constraints, since heterogeneous computations are needed to efficiently engage processors and massively-parallel accelerators. This, in turn, involves different parallel paradigms and computing frameworks and requires complex data exchanges between computing units. Typically, simulation codes rely on sophisticated data structures and computing subroutines, so-called kernels, which makes portability terribly cumbersome. Thus, a natural way to achieve portability is to dramatically reduce the complexity of both data structures and computing kernels. In our algebra-based approach, the scale-resolving simulation of incompressible turbulent flows on unstructured meshes relies on three fundamental kernels: the sparse matrix-vector product, the linear combination of vectors and the dot product. It is note-worthy that this approach is not limited to a particular kind of numerical method or a set of governing equations. In our code, an auto-balanced multilevel partitioning distributes workload among computing devices of various architectures. The overlap of computations and multistage communications efficiently hides the data exchanges overhead in large-scale supercomputer simulations. In addition to computing on accelerators, special attention is paid at efficiency on manycore processors in multiprocessor nodes with significant non-uniform memory access factor. Parallel efficiency and performance are studied in detail for different execution modes on various supercomputers using up to 9,600 processor cores and up to 256 graphics processor units. The heterogeneous implementation model described in this work is a general-purpose approach that is well suited for various subroutines in numerical simulation codes. (C) 2020 Elsevier Ltd. All rights reserved.

引用

页数：10

共 12 条

[1] HPC2-A fully-portable, algebra-based framework for heterogeneous computing. Application to CFD
Alvarez, X.
Gorobets, A.
Trias, F. X.
Borrell, R.
Oyarzun, G.
[J]. COMPUTERS & FLUIDS, 2018, 173 : 285 - 292
[2] Portable implementation model for CFD simulations. Application to hybrid CPU/GPU supercomputers
Oyarzun, Guillermo
Borrell, Ricard
Gorobets, Andrey
Oliva, Assensi
[J]. INTERNATIONAL JOURNAL OF COMPUTATIONAL FLUID DYNAMICS, 2017, 31 (09) : 396 - 411
[3] The Hierarchical Heterogeneous of Parallel Computing Model Based on Method Library
Duan, Jibing
Ji, Xiaopeng
Dou, Jinye
Wei, Zhiqiang
[J]. PROCEEDINGS OF 2013 CHINESE INTELLIGENT AUTOMATION CONFERENCE: INTELLIGENT INFORMATION PROCESSING, 2013, 256 : 255 - 263
[4] Implementation of Motion Estimation Based On Heterogeneous Parallel Computing System with OpenCL
Zhang, Jinglin
Nezan, Jean-Francois
Cousin, Jean-Gabriel
[J]. 2012 IEEE 14TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2012 IEEE 9TH INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS (HPCC-ICESS), 2012, : 41 - 45
[5] A hybrid optimization based on GPU parallel computing method and its application of three dimensional large eddy simulations
Zhang, Yuxuan
Wu, Songping
[J]. APPLIED MECHANICS AND MATERIALS I, PTS 1-3, 2013, 275-277 : 2589 - 2594
[6] SunwayLB: Enabling Extreme-Scale Lattice Boltzmann Method Based Computing Fluid Dynamics Simulations on Advanced Heterogeneous Supercomputers
Liu, Zhao
Chu, Xuesen
Lv, Xiaojing
Meng, Hongsong
Liu, Hanyue
Zhu, Guanghui
Fu, Haohuan
Yang, Guangwen
[J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2024, 35 (02) : 324 - 337
[7] A hybrid MPI-OpenMP parallel implementation for pseudospectral simulations with application to Taylor-Couette flow
Shi, Liang
Rampp, Markus
Hof, Bjoern
Avila, Marc
[J]. COMPUTERS & FLUIDS, 2015, 106 : 1 - 11
[8] Large-scale homo- and heterogeneous parallel paradigm design based on CFD application PHengLEI
Wan, Yunbo
Zhao, Zhong
Liu, Jie
Zhang, Laiping
Zhang, Yong
Chen, Jianqiang
[J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2024, 36 (05):
[9] A New Hybrid Hierarchical Parallel Algorithm to Enhance the Performance of Large-Scale Structural Analysis Based on Heterogeneous Multicore Clusters
Yu, Gaoyuan
Lou, Yunfeng
Dong, Hang
Li, Junjie
Jin, Xianlong
[J]. CMES-COMPUTER MODELING IN ENGINEERING & SCIENCES, 2023, 136 (01): : 135 - 155
[10] A NEW SOFTWARE IMPLEMENTATION OF ULTRASONIC PHASED ARRAY INSPECTION AND 3D IMAGING APPLICATION BASED ON GPU PARALLEL COMPUTING
Gao, Da-liang
Shi, Fang-fang
Zhang, Bi-xing
[J]. PROCEEDINGS OF 2016 SYMPOSIUM ON PIEZOELECTRICITY, ACOUSTIC WAVES, AND DEVICE APPLICATIONS (SPAWDA), 2016, : 131 - 134

← 1 2 →