Enhancing MPI plus OpenMP Task Based Applications for Heterogeneous Architectures with GPU Support

被引:1
|
作者
Ferat, Manuel [1 ]
Pereira, Romain [2 ,4 ,5 ]
Roussel, Adrien [3 ,4 ]
Carribault, Patrick [3 ,4 ]
Steffenel, Luiz-Angelo [1 ]
Gautier, Thierry [5 ]
机构
[1] Univ Reims, LRC DIGIT, LICIIS, F-51097 Reims, France
[2] CEA, DAM, DIF, F-91297 Arpajon, France
[3] CEA, DAM, DIF, LRC DIGIT, F-91297 Arpajon, France
[4] Univ Paris Saclay, CEA, Lab Informat Haute Performance Calcul & Simulat, F-91680 Bruyeres Le Chatel, France
[5] ENS Lyon, LIP, Project Team AVALON INRIA, Lyon, France
关键词
OpenMP; GPU Computing; Distributed Application; Task programming;
D O I
10.1007/978-3-031-15922-0_1
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Heterogeneous supercomputers are widespread over HPC systems and programming efficient applications on these architectures is a challenge. Task-based programming models are a promising way to tackle this challenge. Since OpenMP 4.0 and 4.5, the target directives enable to offload pieces of code to GPUs and to express it as tasks with dependencies. Therefore, heterogeneous machines can be programmed using MPI+OpenMP(task+target) to exhibit a very high level of concurrent asynchronous operations for which data transfers, kernel executions, communications and CPU computations can be overlapped. Hence, it is possible to suspend tasks performing these asynchronous operations on the CPUs and to overlap their completion with another task execution. Suspended tasks can resume once the associated asynchronous event is completed in an opportunistic way at every scheduling point. We have integrated this feature into the MPC framework and validated it on a AXPY microbenchmark and evaluated on a MPI+OpenMP(tasks) implementation of the LULESH proxy applications. The results show that we are able to improve asynchronism and the overall HPC performance, allowing applications to benefit from asynchronous execution on heterogeneous machines.
引用
收藏
页码:3 / 16
页数:14
相关论文
共 50 条
  • [41] Implementing MPI based portable parallel discrete event simulation support in the OMNeT plus plus framework
    Wu, D
    Wu, E
    Lai, J
    Varga, A
    Sekercioglu, YA
    Egan, GK
    SIMULATION IN INDUSTRY, 2002, : 243 - 248
  • [42] Parallel Acceleration of HEVC Decoder Based on CPU plus GPU Heterogeneous Platform
    Ma, Aidi
    Guo, Chengan
    2017 SEVENTH INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND TECHNOLOGY (ICIST2017), 2017, : 323 - 330
  • [43] Exploit Approximation to Support Fault Resiliency in MPI-based Applications
    Rocco, Roberto
    Palermo, Gianluca
    2023 53RD ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS WORKSHOPS, DSN-W, 2023, : 187 - 190
  • [44] On the Task Mapping and Scheduling for DAG-based Embedded Vision Applications on Heterogeneous Multi/Many-core Architectures
    Aldegheri, Stefano
    Bombieri, Nicola
    Patel, Hiren
    PROCEEDINGS OF THE 2020 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE 2020), 2020, : 1003 - 1006
  • [45] Enhancing Performance Portability of MPI Applications Through Annotation-Based Transformations
    Haque, Md Ziaul
    Yi, Qing
    Dinan, James
    Balaji, Pavan
    2013 42ND ANNUAL INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP), 2013, : 631 - 640
  • [46] CATCH - A call-graph based automatic tool for capture of hardware performance metrics for MPI and OpenMP applications
    DeRose, L
    Wolf, F
    EURO-PAR 2002 PARALLEL PROCESSING, PROCEEDINGS, 2002, 2400 : 167 - 176
  • [47] Hybrid MPI plus OpenMP parallelization of an FFT-based 3D Poisson solver with one periodic direction
    Gorobets, A.
    Trias, F. X.
    Borrell, R.
    Lehmkuhl, O.
    Oliva, A.
    COMPUTERS & FLUIDS, 2011, 49 (01) : 101 - 109
  • [48] Real-Time Imaging Scheme of Short-Track GB-SAR Based on GPU plus OpenMP
    Tan, Yunxin
    Huang, Haifeng
    Lai, Tao
    IEEE SENSORS JOURNAL, 2025, 25 (03) : 4990 - 5002
  • [49] Hands on with OpenMP4.5 and Unified Memory: Developing Applications for IBM's Hybrid CPU plus GPU Systems (Part II)
    Grinberg, Leopold
    Bertolli, Carlo
    Haque, Riyaz
    SCALING OPENMP FOR EXASCALE PERFORMANCE AND PORTABILITY (IWOMP 2017), 2017, 10468 : 17 - 29
  • [50] Hands on with OpenMP4.5 and Unified Memory: Developing Applications for IBM's Hybrid CPU plus GPU Systems (Part I)
    Grinberg, Leopold
    Bertolli, Carlo
    Haque, Riyaz
    SCALING OPENMP FOR EXASCALE PERFORMANCE AND PORTABILITY (IWOMP 2017), 2017, 10468 : 3 - 16