Panda: A Compiler Framework for Concurrent CPUGPU Execution of 3D Stencil Computations on GPU-accelerated Supercomputers

被引:0
|
作者
Sourouri, Mohammed [1 ,2 ]
Baden, Scott B. [3 ]
Cai, Xing [1 ,2 ]
机构
[1] Simula Res Lab, Oslo, Norway
[2] Univ Oslo, Dept Informat, Oslo, Norway
[3] Univ Calif San Diego, Dept Comp Sci & Engn, San Diego, CA 92103 USA
关键词
Source-to-source translation; Code generation; Code optimization; CUDA; OpenMP; MPI; Stencil computation; Heterogeneous computing; CPU plus GPU computing; CODE;
D O I
10.1007/s10766-016-0454-1
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We present a new compiler framework for truly heterogeneous 3D stencil computation on GPU clusters. Our framework consists of a simple directive-based programming model and a tightly integrated source-to-source compiler. Annotated with a small number of directives, sequential stencil C codes can be automatically parallelized for large-scale GPU clusters. The most distinctive feature of the compiler is its capability to generate hybrid MPICUDAOpenMP code that uses concurrent CPUGPU computing to unleash the full potential of powerful GPU clusters. The auto-generated hybrid codes hide the overhead of various data motion by overlapping them with computation. Test results on the Titan supercomputer and the Wilkes cluster show that auto-translated codes can achieve about 90 % of the performance of highly optimized handwritten codes, for both a simple stencil benchmark and a real-world application in cardiac modeling. The user-friendliness and performance of our domain-specific compiler framework allow harnessing the full power of GPU-accelerated supercomputing without painstaking coding effort.
引用
收藏
页码:711 / 729
页数:19
相关论文
共 50 条
  • [21] GPU-ACCELERATED INTERACTIVE VISUALIZATION OF 3D VOLUMETRIC DATA USING CUDA
    Kumar, Piyush
    Agrawal, Anupam
    INTERNATIONAL JOURNAL OF IMAGE AND GRAPHICS, 2013, 13 (02)
  • [22] Direct 3D Aerodynamic Optimization of Turbine Blades with GPU-Accelerated CFD
    Amtsfeld, Philipp
    Bestle, Dieter
    Meyer, Marcus
    ADVANCES IN EVOLUTIONARY AND DETERMINISTIC METHODS FOR DESIGN, OPTIMIZATION AND CONTROL IN ENGINEERING AND SCIENCES, 2015, 36 : 197 - 207
  • [23] A GPU-Accelerated TLSPH Algorithm for 3D Geometrical Nonlinear Structural Analysis
    He, Jiandong
    Lei, Juanmian
    INTERNATIONAL JOURNAL OF COMPUTATIONAL METHODS, 2019, 16 (07)
  • [24] 3D GPU-Accelerated Secondary Checks of Radiation Therapy Treatment Plans
    Clemente, F.
    Perez, C.
    MEDICAL PHYSICS, 2014, 41 (06) : 222 - 222
  • [25] GPU-accelerated blind and robust 3D mesh watermarking by geometry image
    Hung-Kuang Chen
    Wei-Sung Chen
    Multimedia Tools and Applications, 2016, 75 : 10077 - 10096
  • [26] GPU-Accelerated Descriptor Extraction Process for 3D Registration in Augmented Reality
    Garrett, Timothy
    Radkowski, Rafael
    Sheaffer, Jeremy
    2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 3085 - 3090
  • [27] An analytical GPU performance model for 3D stencil computations from the angle of data traffic
    Su, Huayou
    Cai, Xing
    Wen, Mei
    Zhang, Chunyuan
    JOURNAL OF SUPERCOMPUTING, 2015, 71 (07): : 2433 - 2453
  • [28] GPU-accelerated elastic 3D image registration for intra-surgical applications
    Ruijters, Daniel
    Romeny, Bart M. ter Haar
    Suetens, Paul
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2011, 103 (02) : 104 - 112
  • [29] GPU-accelerated Matrix-Free 3D Ultrasound Reconstruction for Nondestructive Testing
    Kirchhof, Jan
    Semper, Sebastian
    Roemer, Florian
    2018 IEEE INTERNATIONAL ULTRASONICS SYMPOSIUM (IUS), 2018,
  • [30] Development of a GPU-accelerated 3D neutron dynamics code for PB-FHR
    E, Yanzhi
    Zou, Yang
    Guo, Wei
    Dai, Ye
    Xu, Hongjie
    NUCLEAR ENGINEERING AND DESIGN, 2017, 320 : 88 - 102