Exploration of automatic optimisation for CUDA programming

被引:2
|
作者
Al-Mouhamed, Mayez [1 ]
ul Hassan Khan, Ayaz [1 ]
机构
[1] King Fahd Univ Petr & Minerals, Dept Comp Engn, Dhahran, Saudi Arabia
关键词
CUDA; GPU; parallel programming; compiler transformations; directive-based language; source-to-source compiler; GPGPU;
D O I
10.1080/17445760.2014.953158
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Writing optimised compute unified device architecture (CUDA) program for graphic processing units (GPUs) is complex even for experts. We present a design methodology for a restructuring tool that converts C-loops into optimised CUDA kernels based on a three-step algorithm which are loop tiling, coalesced memory access and resource optimisation. A method for finding possible loop tiling solutions with coalesced memory access is developed and a simplified algorithm for restructuring C-loops into an efficient CUDA kernel is presented. In the evaluation, we implement matrix multiply (MM), matrix transpose (M-transpose), matrix scaling (M-scaling) and matrix vector multiply (MV) using the proposed algorithm. We present the analysis of the execution time and GPU throughput for the above applications, which favourably compare to other proposals. Evaluation is carried out while scaling the problem size and running under a variety of kernel configurations. The obtained speedup is about 28-35% for M-transpose compared to NVIDIA Software Development Kit, 33% speedup for MV compared to general purpose computation on graphics processing unit compiler, and more than 80% speedup forMMand M-scaling compared to CUDA-lite.
引用
收藏
页码:309 / 324
页数:16
相关论文
共 50 条
  • [1] Exploration of Automatic Optimization for CUDA Programming
    Al-Mouhamed, Mayez
    Khan, Ayaz ul Hassan
    [J]. 2012 2ND IEEE INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND GRID COMPUTING (PDGC), 2012, : 55 - 60
  • [2] A CUDA programming toolkit on grids
    Liang, Tyng-Yeu
    Chang, Yu-Wei
    Li, Hung-Fu
    [J]. INTERNATIONAL JOURNAL OF GRID AND UTILITY COMPUTING, 2012, 3 (2-3) : 97 - 111
  • [3] CUDA memory optimisation strategies for motion estimation
    Sayadi, Fatma Elzahra
    Chouchene, Marwa
    Bahri, Haithem
    Khemiri, Randa
    Atri, Mohamed
    [J]. IET COMPUTERS AND DIGITAL TECHNIQUES, 2019, 13 (01): : 20 - 27
  • [4] GSGP-CUDA-A CUDA framework for Geometric Semantic Genetic Programming
    Trujillo, Leonardo
    Munoz Contreras, Jose Manuel
    Hernandez, Daniel E.
    Castelli, Mauro
    Tapia, Juan J.
    [J]. SOFTWAREX, 2022, 18
  • [5] Structural testing for CUDA programming model
    Luz, Helder J. F.
    Souza, Paulo S. L.
    Souza, Simone R. S.
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2024, 36 (14):
  • [6] CUDABlock: A GUI Programming Tool for CUDA
    Lin, Hsih-Hsin
    Tu, Chia-Heng
    Hwang, Yuan-Shin
    [J]. 2015 44TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING WORKSHOPS, 2015, : 37 - 42
  • [7] Web applications for learning CUDA programming
    Wakatani, Akiyoshi
    Maeda, Toshiyuki
    [J]. 2017 8TH INTERNATIONAL CONFERENCE ON INFORMATION, INTELLIGENCE, SYSTEMS & APPLICATIONS (IISA), 2017, : 571 - 575
  • [8] CUDA-enabled Optimisation of Technical Analysis Parameters
    O'Rourke, John
    Burns, John
    [J]. 2012 IEEE/ACM 16TH INTERNATIONAL SYMPOSIUM ON DISTRIBUTED SIMULATION AND REAL TIME APPLICATIONS (DS-RT), 2012, : 221 - 227
  • [9] GPUBlocks: GUI Programming Tool for CUDA and OpenCL
    Yuan-Shin Hwang
    Hsih-Hsin Lin
    Shen-Hung Pai
    Chia-Heng Tu
    [J]. Journal of Signal Processing Systems, 2019, 91 : 235 - 245
  • [10] CUDAMicroBench: Microbenchmarks to Assist CUDA Performance Programming
    Yi, Xinyao
    Stokes, David
    Yan, Yonghong
    Liao, Chunhua
    [J]. 2021 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2021, : 397 - 406