Panda: A Compiler Framework for Concurrent CPUGPU Execution of 3D Stencil Computations on GPU-accelerated Supercomputers

被引:0
|
作者
Sourouri, Mohammed [1 ,2 ]
Baden, Scott B. [3 ]
Cai, Xing [1 ,2 ]
机构
[1] Simula Res Lab, Oslo, Norway
[2] Univ Oslo, Dept Informat, Oslo, Norway
[3] Univ Calif San Diego, Dept Comp Sci & Engn, San Diego, CA 92103 USA
关键词
Source-to-source translation; Code generation; Code optimization; CUDA; OpenMP; MPI; Stencil computation; Heterogeneous computing; CPU plus GPU computing; CODE;
D O I
10.1007/s10766-016-0454-1
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We present a new compiler framework for truly heterogeneous 3D stencil computation on GPU clusters. Our framework consists of a simple directive-based programming model and a tightly integrated source-to-source compiler. Annotated with a small number of directives, sequential stencil C codes can be automatically parallelized for large-scale GPU clusters. The most distinctive feature of the compiler is its capability to generate hybrid MPICUDAOpenMP code that uses concurrent CPUGPU computing to unleash the full potential of powerful GPU clusters. The auto-generated hybrid codes hide the overhead of various data motion by overlapping them with computation. Test results on the Titan supercomputer and the Wilkes cluster show that auto-translated codes can achieve about 90 % of the performance of highly optimized handwritten codes, for both a simple stencil benchmark and a real-world application in cardiac modeling. The user-friendliness and performance of our domain-specific compiler framework allow harnessing the full power of GPU-accelerated supercomputing without painstaking coding effort.
引用
收藏
页码:711 / 729
页数:19
相关论文
共 50 条
  • [1] Panda: A Compiler Framework for Concurrent CPU+\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$+$$\end{document}GPU Execution of 3D Stencil Computations on GPU-accelerated Supercomputers
    Mohammed Sourouri
    Scott B. Baden
    Xing Cai
    International Journal of Parallel Programming, 2017, 45 (3) : 711 - 729
  • [2] On the GPU Performance of 3D Stencil Computations Implemented in OpenCL
    Su, Huayou
    Wu, Nan
    Wen, Mei
    Zhang, Chunyuan
    Cai, Xing
    SUPERCOMPUTING (ISC 2013), 2013, 7905 : 125 - 135
  • [3] GPU-Accelerated 3D Normal Distributions Transform
    Nguyen, Anh
    Cano, Abraham Monrroy
    Edahiro, Masato
    Kato, Shinpei
    JOURNAL OF ROBOTICS AND MECHATRONICS, 2023, 35 (02) : 445 - 459
  • [4] GPU-accelerated feature tracking for 3D reconstruction
    Cao, Mingwei
    Jia, Wei
    Li, Shujie
    Li, Yujie
    Zheng, Liping
    Liu, Xiaoping
    OPTICS AND LASER TECHNOLOGY, 2019, 110 (165-175): : 165 - 175
  • [5] GPU-accelerated Parallel 3D Image Thinning
    Hu, Bingfeng
    Yang, Xuan
    2013 IEEE 15TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2013 IEEE INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING (HPCC_EUC), 2013, : 149 - 152
  • [6] GPU-Accelerated Nearest Neighbor Search for 3D Registration
    Qiu, Deyuan
    May, Stefan
    Nuechter, Andreas
    COMPUTER VISION SYSTEMS, PROCEEDINGS, 2009, 5815 : 194 - +
  • [7] GPU-accelerated denoising of 3D magnetic resonance images
    Howison, Mark
    Bethel, E. Wes
    JOURNAL OF REAL-TIME IMAGE PROCESSING, 2017, 13 (04) : 713 - 724
  • [8] GPU-accelerated denoising of 3D magnetic resonance images
    Mark Howison
    E. Wes Bethel
    Journal of Real-Time Image Processing, 2017, 13 : 713 - 724
  • [9] GPU-Accelerated Tracking of the Motion of 3D Articulated Figure
    Krzeszowski, Tomasz
    Kwolek, Bogdan
    Wojciechowski, Konrad
    COMPUTER VISION AND GRAPHICS, PT I, 2010, 6374 : 155 - 162
  • [10] On the GPU-CPU Performance Portability of OpenCL for 3D Stencil Computations
    Su, Huayou
    Wu, Nan
    Wen, Mei
    Zhang, Chunyuan
    Cai, Xing
    2013 19TH IEEE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS 2013), 2013, : 78 - 85