Enhancing MPI plus OpenMP Task Based Applications for Heterogeneous Architectures with GPU Support

被引：1

作者：

Ferat, Manuel ^{[1
]}

Pereira, Romain ^{[2
,4
,5
]}

Roussel, Adrien ^{[3
,4
]}

Carribault, Patrick ^{[3
,4
]}

Steffenel, Luiz-Angelo ^{[1
]}

Gautier, Thierry ^{[5
]}

机构：

[1] Univ Reims, LRC DIGIT, LICIIS, F-51097 Reims, France

[2] CEA, DAM, DIF, F-91297 Arpajon, France

[3] CEA, DAM, DIF, LRC DIGIT, F-91297 Arpajon, France

[4] Univ Paris Saclay, CEA, Lab Informat Haute Performance Calcul & Simulat, F-91680 Bruyeres Le Chatel, France

[5] ENS Lyon, LIP, Project Team AVALON INRIA, Lyon, France

来源：

OPENMP IN A MODERN WORLD: FROM MULTI-DEVICE SUPPORT TO META PROGRAMMING | 2022年 / 13527卷

关键词：

OpenMP; GPU Computing; Distributed Application; Task programming;

D O I：

10.1007/978-3-031-15922-0_1

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Heterogeneous supercomputers are widespread over HPC systems and programming efficient applications on these architectures is a challenge. Task-based programming models are a promising way to tackle this challenge. Since OpenMP 4.0 and 4.5, the target directives enable to offload pieces of code to GPUs and to express it as tasks with dependencies. Therefore, heterogeneous machines can be programmed using MPI+OpenMP(task+target) to exhibit a very high level of concurrent asynchronous operations for which data transfers, kernel executions, communications and CPU computations can be overlapped. Hence, it is possible to suspend tasks performing these asynchronous operations on the CPUs and to overlap their completion with another task execution. Suspended tasks can resume once the associated asynchronous event is completed in an opportunistic way at every scheduling point. We have integrated this feature into the MPC framework and validated it on a AXPY microbenchmark and evaluated on a MPI+OpenMP(tasks) implementation of the LULESH proxy applications. The results show that we are able to improve asynchronism and the overall HPC performance, allowing applications to benefit from asynchronous execution on heterogeneous machines.

引用

页码：3 / 16

页数：14

共 50 条

[41] Implementing MPI based portable parallel discrete event simulation support in the OMNeT plus plus framework
Wu, D
Wu, E
Lai, J
Varga, A
Sekercioglu, YA
Egan, GK
SIMULATION IN INDUSTRY, 2002, : 243 - 248
[42] Parallel Acceleration of HEVC Decoder Based on CPU plus GPU Heterogeneous Platform
Ma, Aidi
Guo, Chengan
2017 SEVENTH INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND TECHNOLOGY (ICIST2017), 2017, : 323 - 330
[43] Exploit Approximation to Support Fault Resiliency in MPI-based Applications
Rocco, Roberto
Palermo, Gianluca
2023 53RD ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS WORKSHOPS, DSN-W, 2023, : 187 - 190
[44] On the Task Mapping and Scheduling for DAG-based Embedded Vision Applications on Heterogeneous Multi/Many-core Architectures
Aldegheri, Stefano
Bombieri, Nicola
Patel, Hiren
PROCEEDINGS OF THE 2020 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE 2020), 2020, : 1003 - 1006
[45] Enhancing Performance Portability of MPI Applications Through Annotation-Based Transformations
Haque, Md Ziaul
Yi, Qing
Dinan, James
Balaji, Pavan
2013 42ND ANNUAL INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP), 2013, : 631 - 640
[46] CATCH - A call-graph based automatic tool for capture of hardware performance metrics for MPI and OpenMP applications
DeRose, L
Wolf, F
EURO-PAR 2002 PARALLEL PROCESSING, PROCEEDINGS, 2002, 2400 : 167 - 176
[47] Hybrid MPI plus OpenMP parallelization of an FFT-based 3D Poisson solver with one periodic direction
Gorobets, A.
Trias, F. X.
Borrell, R.
Lehmkuhl, O.
Oliva, A.
COMPUTERS & FLUIDS, 2011, 49 (01) : 101 - 109
[48] Real-Time Imaging Scheme of Short-Track GB-SAR Based on GPU plus OpenMP
Tan, Yunxin
Huang, Haifeng
Lai, Tao
IEEE SENSORS JOURNAL, 2025, 25 (03) : 4990 - 5002
[49] Hands on with OpenMP4.5 and Unified Memory: Developing Applications for IBM's Hybrid CPU plus GPU Systems (Part II)
Grinberg, Leopold
Bertolli, Carlo
Haque, Riyaz
SCALING OPENMP FOR EXASCALE PERFORMANCE AND PORTABILITY (IWOMP 2017), 2017, 10468 : 17 - 29
[50] Hands on with OpenMP4.5 and Unified Memory: Developing Applications for IBM's Hybrid CPU plus GPU Systems (Part I)
Grinberg, Leopold
Bertolli, Carlo
Haque, Riyaz
SCALING OPENMP FOR EXASCALE PERFORMANCE AND PORTABILITY (IWOMP 2017), 2017, 10468 : 3 - 16

← 1 2 3 4 5 →