Enhancing MPI plus OpenMP Task Based Applications for Heterogeneous Architectures with GPU Support

被引：1

作者：

Ferat, Manuel ^{[1
]}

Pereira, Romain ^{[2
,4
,5
]}

Roussel, Adrien ^{[3
,4
]}

Carribault, Patrick ^{[3
,4
]}

Steffenel, Luiz-Angelo ^{[1
]}

Gautier, Thierry ^{[5
]}

机构：

[1] Univ Reims, LRC DIGIT, LICIIS, F-51097 Reims, France

[2] CEA, DAM, DIF, F-91297 Arpajon, France

[3] CEA, DAM, DIF, LRC DIGIT, F-91297 Arpajon, France

[4] Univ Paris Saclay, CEA, Lab Informat Haute Performance Calcul & Simulat, F-91680 Bruyeres Le Chatel, France

[5] ENS Lyon, LIP, Project Team AVALON INRIA, Lyon, France

来源：

OPENMP IN A MODERN WORLD: FROM MULTI-DEVICE SUPPORT TO META PROGRAMMING | 2022年 / 13527卷

关键词：

OpenMP; GPU Computing; Distributed Application; Task programming;

D O I：

10.1007/978-3-031-15922-0_1

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Heterogeneous supercomputers are widespread over HPC systems and programming efficient applications on these architectures is a challenge. Task-based programming models are a promising way to tackle this challenge. Since OpenMP 4.0 and 4.5, the target directives enable to offload pieces of code to GPUs and to express it as tasks with dependencies. Therefore, heterogeneous machines can be programmed using MPI+OpenMP(task+target) to exhibit a very high level of concurrent asynchronous operations for which data transfers, kernel executions, communications and CPU computations can be overlapped. Hence, it is possible to suspend tasks performing these asynchronous operations on the CPUs and to overlap their completion with another task execution. Suspended tasks can resume once the associated asynchronous event is completed in an opportunistic way at every scheduling point. We have integrated this feature into the MPC framework and validated it on a AXPY microbenchmark and evaluated on a MPI+OpenMP(tasks) implementation of the LULESH proxy applications. The results show that we are able to improve asynchronism and the overall HPC performance, allowing applications to benefit from asynchronous execution on heterogeneous machines.

引用

页码：3 / 16

页数：14

共 50 条

[21] A Heterogeneous MPI plus PPL Task Scheduling Approach for Asynchronous Many-Task Runtime Systems
Holmen, John K.
Sahasrabudhe, Damodar
Berzins, Martin
PRACTICE AND EXPERIENCE IN ADVANCED RESEARCH COMPUTING 2021, PEARC 2021, 2021,
[22] Heterogeneous programming using OpenMP and CUDA/HIP for hybrid CPU-GPU scientific applications
Tallada, Marc Gonzalez
Morancho, Enric
INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2023, 37 (05): : 626 - 646
[23] Adaptive Stochastic Gradient Descent for Deep Learning on Heterogeneous CPU plus GPU Architectures
Ma, Yujing
Rusu, Florin
Wu, Kesheng
Sim, Alexander
2021 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2021, : 6 - 15
[24] TOPPER: An integrated environment for task allocation and execution of MPI applications onto parallel architectures
Konstantinou, D
Koziris, N
ADVANCES IN INFORMATICS, 2003, 2563 : 336 - 350
[25] GPU-Based Embedded Intelligence Architectures and Applications
Ang, Li Minn
Seng, Kah Phooi
ELECTRONICS, 2021, 10 (08)
[26] Enhancing Intra-Node GPU-to-GPU Performance in MPI plus UCX through Multi-Path Communication
Sojoodi, Amirhossein
Temucin, Yiltan Hassan
Afsahi, Ahmad
PROCEEDINGS OF 2024 3RD INTERNATIONAL WORKSHOP ON EXTREME HETEROGENEITY SOLUTIONS, EXHET 2024, 2024, : 9 - 14
[27] A Hybrid MPI plus OpenMP Solution of the Distributed Cluster-Based Fish Schooling Simulator
Borges, Francisco
Gutierrez-Milla, Albert
Suppi, Remo
Luque, Emilio
2014 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE, 2014, 29 : 2111 - 2120
[28] Algorithms for Scheduling Task-based Applications onto Heterogeneous Many-core Architectures
Kinsy, Michel A.
Devadas, Srinivas
2014 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2014,
[29] sOMP: Simulating OpenMP Task-Based Applications with NUMA Effects
Daoudi, Idriss
Virouleau, Philippe
Gautier, Thierry
Thibault, Samuel
Aumage, Olivier
OPENMP: PORTABLE MULTI-LEVEL PARALLELISM ON MODERN SYSTEMS, 2020, 12295 : 197 - 211
[30] Efficient MPI-based Communication for GPU-Accelerated Dask Applications
Shafi, Aamir
Hashmi, Jahanzeb Maqbool
Subramoni, Hari
Panda, Dhabaleswar K.
21ST IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND INTERNET COMPUTING (CCGRID 2021), 2021, : 277 - 286

← 1 2 3 4 5 →