Panda: A Compiler Framework for Concurrent CPUGPU Execution of 3D Stencil Computations on GPU-accelerated Supercomputers

被引：0

作者：

Sourouri, Mohammed ^{[1
,2
]}

Baden, Scott B. ^{[3
]}

Cai, Xing ^{[1
,2
]}

机构：

[1] Simula Res Lab, Oslo, Norway

[2] Univ Oslo, Dept Informat, Oslo, Norway

[3] Univ Calif San Diego, Dept Comp Sci & Engn, San Diego, CA 92103 USA

来源：

INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING | 2017年 / 45卷 / 03期

关键词：

Source-to-source translation; Code generation; Code optimization; CUDA; OpenMP; MPI; Stencil computation; Heterogeneous computing; CPU plus GPU computing; CODE;

D O I：

10.1007/s10766-016-0454-1

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

We present a new compiler framework for truly heterogeneous 3D stencil computation on GPU clusters. Our framework consists of a simple directive-based programming model and a tightly integrated source-to-source compiler. Annotated with a small number of directives, sequential stencil C codes can be automatically parallelized for large-scale GPU clusters. The most distinctive feature of the compiler is its capability to generate hybrid MPICUDAOpenMP code that uses concurrent CPUGPU computing to unleash the full potential of powerful GPU clusters. The auto-generated hybrid codes hide the overhead of various data motion by overlapping them with computation. Test results on the Titan supercomputer and the Wilkes cluster show that auto-translated codes can achieve about 90 % of the performance of highly optimized handwritten codes, for both a simple stencil benchmark and a real-world application in cardiac modeling. The user-friendliness and performance of our domain-specific compiler framework allow harnessing the full power of GPU-accelerated supercomputing without painstaking coding effort.

引用

页码：711 / 729

页数：19

共 50 条

[1] Panda: A Compiler Framework for Concurrent CPU+\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$+$$\end{document}GPU Execution of 3D Stencil Computations on GPU-accelerated Supercomputers
Mohammed Sourouri
Scott B. Baden
Xing Cai
International Journal of Parallel Programming, 2017, 45 (3) : 711 - 729
[2] On the GPU Performance of 3D Stencil Computations Implemented in OpenCL
Su, Huayou
Wu, Nan
Wen, Mei
Zhang, Chunyuan
Cai, Xing
SUPERCOMPUTING (ISC 2013), 2013, 7905 : 125 - 135
[3] GPU-Accelerated 3D Normal Distributions Transform
Nguyen, Anh
Cano, Abraham Monrroy
Edahiro, Masato
Kato, Shinpei
JOURNAL OF ROBOTICS AND MECHATRONICS, 2023, 35 (02) : 445 - 459
[4] GPU-accelerated feature tracking for 3D reconstruction
Cao, Mingwei
Jia, Wei
Li, Shujie
Li, Yujie
Zheng, Liping
Liu, Xiaoping
OPTICS AND LASER TECHNOLOGY, 2019, 110 (165-175): : 165 - 175
[5] GPU-accelerated Parallel 3D Image Thinning
Hu, Bingfeng
Yang, Xuan
2013 IEEE 15TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2013 IEEE INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING (HPCC_EUC), 2013, : 149 - 152
[6] GPU-Accelerated Nearest Neighbor Search for 3D Registration
Qiu, Deyuan
May, Stefan
Nuechter, Andreas
COMPUTER VISION SYSTEMS, PROCEEDINGS, 2009, 5815 : 194 - +
[7] GPU-accelerated denoising of 3D magnetic resonance images
Howison, Mark
Bethel, E. Wes
JOURNAL OF REAL-TIME IMAGE PROCESSING, 2017, 13 (04) : 713 - 724
[8] GPU-accelerated denoising of 3D magnetic resonance images
Mark Howison
E. Wes Bethel
Journal of Real-Time Image Processing, 2017, 13 : 713 - 724
[9] GPU-Accelerated Tracking of the Motion of 3D Articulated Figure
Krzeszowski, Tomasz
Kwolek, Bogdan
Wojciechowski, Konrad
COMPUTER VISION AND GRAPHICS, PT I, 2010, 6374 : 155 - 162
[10] On the GPU-CPU Performance Portability of OpenCL for 3D Stencil Computations
Su, Huayou
Wu, Nan
Wen, Mei
Zhang, Chunyuan
Cai, Xing
2013 19TH IEEE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS 2013), 2013, : 78 - 85

← 1 2 3 4 5 →