RT-CUDA: A Software Tool for CUDA Code Restructuring

被引:0
|
作者
Ayaz H. Khan
Mayez Al-Mouhamed
Muhammed Al-Mulhem
Adel F. Ahmed
机构
[1] Qassim University,Computer Science Department
[2] King Fahd University of Petroleum and Minerals,Computer Engineering Department
[3] King Fahd University of Petroleum and Minerals,Information and Computer Science Department
关键词
CUDA; GPGPU; nVidia Kepler; Massively parallel programming; Kernel optimizations;
D O I
暂无
中图分类号
学科分类号
摘要
Recent development in graphic processing units (GPUs) has opened a new challenge in harnessing their computing power as a new general purpose computing paradigm. However, porting applications to CUDA remains a challenge to average programmers, which have to package code in separate functions, explicitly manage data transfers between the host and device memories, and manually optimize GPU memory utilization. In this paper, we propose a restructuring tool (RT-CUDA) that takes a C-like program and some user directives as compiler hints to produce an optimized CUDA code. The tool strategy is based on efficient management of the memory system to minimize data motion by managing the transfer between host and device, maximizing bandwidth for device memory accesses, and enhancing data locality and re-use of cached data using shared-memory and registers. Enhanced resource utilization is implemented by re-writing code as parametric kernels and use of efficient auto-tuning. The tool enables calling numerical libraries (CuBLAS, CuSPARSE, etc.) to help implement applications in science simulation like iterative linear algebra solvers. For the above applications, the tool implement an inter-block global synchronization which allow the execution overall among a few iterations which is helpful to balance load and to avoid polling. Evaluation of RT-CUDA has been performed using a variety of basic linear algebra operators (Madd, MM, MV, VV, etc.) as well as the programming of iterative solvers for systems of linear equations like Jacobi and Conjugate Gradient algorithms. Significant speedup has been achieved over other compilers like PGI OpenACC and GPGPU compilers for the above applications. Evaluation shows that generated kernels efficiently call math libraries and enable implementing complete iterative solvers. The tool help scientists developing parallel simulators like reservoir simulators, molecular dynamics, etc. without exposing to complexity of GPU and CUDA programming. We have partnership with a group of researchers at the Saudi Aramco, a national company in Saudi Arabia. RT-CUDA is currently explored as a potential development tool for applications involving linear algebra solvers by the above group. In addition, RT-CUDA is being used by Senior and Graduate students at King Fahd University of Petroleum and Minerals in their projects as part of RT-CUDA continuous enhancement.
引用
收藏
页码:551 / 594
页数:43
相关论文
共 50 条
  • [1] RT-CUDA: A Software Tool for CUDA Code Restructuring
    Khan, Ayaz H.
    Al-Mouhamed, Mayez
    Al-Mulhem, Muhammed
    Ahmed, Adel F.
    [J]. INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2017, 45 (03) : 551 - 594
  • [2] FLAP: Tool to generate CUDA code from sequential C code
    Hernandez Rubio, Erika
    Meneses Viveros, Amilcar
    Cortes Perez, Pedro M.
    Hernandez Zavala, Sergio D.
    Martinez Rios, Hector M.
    [J]. 2014 INTERNATIONAL CONFERENCE ON ELECTRONICS, COMMUNICATIONS AND COMPUTERS (CONIELECOMP), 2014, : 35 - 40
  • [3] A Hardware/Software View of CUDA
    Dobravec, Tomaz
    Bulic, Patricio
    [J]. ELEKTROTEHNISKI VESTNIK-ELECTROCHEMICAL REVIEW, 2010, 77 (05): : 267 - 272
  • [4] Polyhedral Parallel Code Generation for CUDA
    Verdoolaege, Sven
    Carlos Juega, Juan
    Cohen, Albert
    Ignacio Gomez, Jose
    Tenllado, Christian
    Catthoor, Francky
    [J]. ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2013, 9 (04)
  • [5] CUDABlock: A GUI Programming Tool for CUDA
    Lin, Hsih-Hsin
    Tu, Chia-Heng
    Hwang, Yuan-Shin
    [J]. 2015 44TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING WORKSHOPS, 2015, : 37 - 42
  • [6] CUDA code support in Multiagent platform JADE
    Zaoralek, Lukas
    Gajdos, Petr
    [J]. 2012 12TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS (ISDA), 2012, : 546 - 551
  • [7] Porting a Legacy CUDA Stencil Code to oneAPI
    Christgau, Steffen
    Zuse, Thomas Steink
    [J]. 2020 IEEE 34TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2020), 2020, : 359 - 367
  • [8] GPUBlocks: GUI Programming Tool for CUDA and OpenCL
    Yuan-Shin Hwang
    Hsih-Hsin Lin
    Shen-Hung Pai
    Chia-Heng Tu
    [J]. Journal of Signal Processing Systems, 2019, 91 : 235 - 245
  • [9] GPPT: A Power Prediction Tool for CUDA Applications
    Alavani, Gargi
    Desai, Jineet
    Sarkar, Santonu
    [J]. 2021 36TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING WORKSHOPS (ASEW 2021), 2021, : 247 - 250
  • [10] Swan: A tool for porting CUDA programs to OpenCL
    Harvey, M. J.
    De Fabritiis, G.
    [J]. COMPUTER PHYSICS COMMUNICATIONS, 2011, 182 (04) : 1093 - 1099