Heterogeneous programming using OpenMP and CUDA/HIP for hybrid CPU-GPU scientific applications

被引:0
|
作者
Tallada, Marc Gonzalez [1 ,2 ]
Morancho, Enric [1 ]
机构
[1] Univ Politecn Catalunya BarcelonaTECH, Comp Architecture Dept, Barcelona, Spain
[2] Univ Politecn Catalunya BarcelonaTECH, Comp Architecture Dept, Jordi Girona 1-3, Barcelona 08034, Spain
关键词
Heterogeneous programming; hybrid CPU-GPU; OpenMP; CUDA; HIP; PRACTICAL SCHEDULING SCHEME; PARALLEL; MULTICORE; MEMORY; IMPLEMENTATION; SYSTEMS; MPI;
D O I
10.1177/10943420231188079
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Hybrid computer systems combine compute units (CUs) of different nature like CPUs, GPUs and FPGAs. Simultaneously exploiting the computing power of these CUs requires a careful decomposition of the applications into balanced parallel tasks according to both the performance of each CU type and the communication costs among them. This paper describes the design and implementation of runtime support for OpenMP hybrid GPU-CPU applications, when mixed with GPU-oriented programming models (e.g. CUDA/HIP). The paper describes the case for a hybrid multi-level parallelization of the NPB-MZ benchmark suite. The implementation exploits both coarse-grain and fine-grain parallelism, mapped to compute units of different nature (GPUs and CPUs). The paper describes the implementation of runtime support to bridge OpenMP and HIP, introducing the abstractions of Computing Unit and Data Placement. We compare hybrid and non-hybrid executions under state-of-the-art schedulers for OpenMP: static and dynamic task schedulings. Then, we improve the set of schedulers with two additional variants: a memorizing-dynamic task scheduling and a profile-based static task scheduling. On a computing node composed of one AMD EPYC 7742 @ 2.250 GHz (64 cores and 2 threads/core, totalling 128 threads per node) and 2 x GPU AMD Radeon Instinct MI50 with 32 GB, hybrid executions present speedups from 1.10x up to 3.5x with respect to a non-hybrid GPU implementation, depending on the number of activated CUs.
引用
收藏
页码:626 / 646
页数:21
相关论文
共 50 条
  • [1] Boosting CUDA Applications with CPU-GPU Hybrid Computing
    Lee, Changmin
    Ro, Won Woo
    Gaudiot, Jean-Luc
    [J]. INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2014, 42 (02) : 384 - 404
  • [2] MPtostream:an OpenMP compiler for CPU-GPU heterogeneous parallel systems
    YANG XueJun
    [J]. Science China(Information Sciences), 2012, 55 (09) : 1961 - 1971
  • [3] MPtostream: an OpenMP compiler for CPU-GPU heterogeneous parallel systems
    XueJun Yang
    Tao Tang
    GuiBin Wang
    Jia Jia
    XinHai Xu
    [J]. Science China Information Sciences, 2012, 55 : 1961 - 1971
  • [4] MPtostream: an OpenMP compiler for CPU-GPU heterogeneous parallel systems
    Yang XueJun
    Tang Tao
    Wang GuiBin
    Jia Jia
    Xu XinHai
    [J]. SCIENCE CHINA-INFORMATION SCIENCES, 2012, 55 (09) : 1961 - 1971
  • [5] Implementing Delay Multiply and Sum Beamformer on a Hybrid CPU-GPU Platform for Medical Ultrasound Imaging Using OpenMP and CUDA
    Song, Ke
    Liu, Paul
    Liu, Dongquan
    [J]. CMES-COMPUTER MODELING IN ENGINEERING & SCIENCES, 2021, 128 (03): : 1133 - 1150
  • [6] Heterogeneous CPU-GPU Execution of Stencil Applications
    Siklosi, Balint
    Reguly, Istvan Z.
    Mudalige, Gihan R.
    [J]. PROCEEDINGS OF 2018 IEEE/ACM INTERNATIONAL WORKSHOP ON PERFORMANCE, PORTABILITY AND PRODUCTIVITY IN HPC (P3HPC 2018), 2018, : 71 - 80
  • [7] Smart Scheduler for CUDA Programming in Heterogeneous CPU/GPU Environment
    Khan, Naajil Aamir
    Latif, Muhammad Bilal
    Pervaiz, Nida
    Baig, Mubashir
    Khatoon, Hasina
    Baig, Mirza Zaeem
    Burney, Atika
    [J]. PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON COMPUTER MODELING AND SIMULATION (ICCMS 2019) AND 8TH INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND APPLICATIONS (ICICA 2019), 2019, : 250 - 253
  • [8] Evaluation of NDVI and NDWI parameters in CPU-GPU Heterogeneous Platforms based CUDA
    Guerrouj, Fatima Zahra
    Latif, Rachid
    Saddik, Amine
    [J]. PROCEEDINGS OF 2020 5TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND ARTIFICIAL INTELLIGENCE: TECHNOLOGIES AND APPLICATIONS (CLOUDTECH'20), 2020, : 74 - 79
  • [9] Boosting CUDA Applications with CPU–GPU Hybrid Computing
    Changmin Lee
    Won Woo Ro
    Jean-Luc Gaudiot
    [J]. International Journal of Parallel Programming, 2014, 42 : 384 - 404
  • [10] Hybrid CUDA, OpenMP, and MPI parallel programming on multicore GPU clusters
    Yang, Chao-Tung
    Huang, Chih-Lin
    Lin, Cheng-Fang
    [J]. COMPUTER PHYSICS COMMUNICATIONS, 2011, 182 (01) : 266 - 269