Performance Comparison of CUDA and OpenACC Based on Optimizations

被引:2
|
作者
Li, Xuechao [1 ]
Shih, Po-Chou [2 ]
机构
[1] Concordia Univ Chicago, Dept Comp Sci, River Forest, IL 60305 USA
[2] Natl Taipei Univ Technol, Inst Ind & Business Management, Taipei, Taiwan
关键词
Performance comparison; OpenACC; CUDA; Parallel Optimization;
D O I
10.1145/3234664.3234681
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Based on various optimizations, this paper presents a performance comparison between CUDA and OpenACC using 19 kernels in 10 benchmarks. The performance analysis focuses on programming models, optimization technologies and underlying compilers. It measures and compares kernel execution times and data transfer times to/from the GPU. In addition, it utilizes a Performance Ratio metric to conduct an objective comparison. The experimental results show that in general the PGI compiler is able to translate OpenACC kernels into object code that is slightly slower than hand-written CUDA codes for benchmarks that solve the same problem. Also, the data transfer time in OpenACC programs tends to be much faster than in CUDA, while the number of memcpy calls tends to be higher than in CUDA. Overall conclusions were found that OpenACC is a very reliable programming model and a good alternative to CUDA for accelerator devices. For the programs in our test corpus, OpenACC performs as well as CUDA, and in general, OpenACC is better for novices and for programmers targeting multiple platforms.
引用
收藏
页码:53 / 57
页数:5
相关论文
共 50 条
  • [1] An Early Performance Comparison of CUDA and OpenACC
    Li, Xuechao
    Shih, Po-Chou
    [J]. 2018 3RD INTERNATIONAL CONFERENCE ON MEASUREMENT INSTRUMENTATION AND ELECTRONICS (ICMIE 2018), 2018, 208
  • [2] Nekbone performance on GPUs with OpenACC and CUDA Fortran implementations
    Jing Gong
    Stefano Markidis
    Erwin Laure
    Matthew Otten
    Paul Fischer
    Misun Min
    [J]. The Journal of Supercomputing, 2016, 72 : 4160 - 4180
  • [3] Accelerating Phylogenetic Inference on GPUs: an OpenACC and CUDA comparison
    Kuan, Lidia
    Neves, Joao
    Pratas, Frederico
    Tomas, Pedro
    Sousa, Leonel
    [J]. PROCEEDINGS IWBBIO 2014: INTERNATIONAL WORK-CONFERENCE ON BIOINFORMATICS AND BIOMEDICAL ENGINEERING, VOLS 1 AND 2, 2014, : 589 - 600
  • [4] Nekbone performance on GPUs with OpenACC and CUDA Fortran implementations
    Gong, Jing
    Markidis, Stefano
    Laure, Erwin
    Otten, Matthew
    Fischer, Paul
    Min, Misun
    [J]. JOURNAL OF SUPERCOMPUTING, 2016, 72 (11): : 4160 - 4180
  • [5] OpenACC cache Directive: Opportunities and Optimizations
    Lashgar, Ahmad
    Baniasadi, Amirali
    [J]. PROCEEDINGS OF WACCPD 2016: THIRD WORKSHOP ON ACCELERATOR PROGRAMMING USING DIRECTIVES, 2016, : 46 - 56
  • [6] Accelerating Hydrocodes with OpenACC, OpenCL and CUDA
    Herdman, J. A.
    Gaudin, W. P.
    McIntosh-Smith, S.
    Boulton, M.
    Beckingsale, D. A.
    Mallinson, A. C.
    Jarvis, S. A.
    [J]. 2012 SC COMPANION: HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SCC), 2012, : 465 - 471
  • [7] Parallel Computation of Aerial Target Reflection of Background Infrared Radiation: Performance Comparison of OpenMP, OpenACC, and CUDA Implementations
    Guo, Xing
    Wu, Jiaji
    Wu, Zhensen
    Huang, Bormin
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2016, 9 (04) : 1653 - 1662
  • [8] Evaluating the Performance and Cost of Accelerating Seismic Processing with CUDA, OpenCL, OpenACC, and OpenMP
    Gimenes, Tiago L.
    Pisani, Flavia
    Borin, Edson
    [J]. 2018 32ND IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2018, : 399 - 408
  • [9] A Source-to-Source OpenACC Compiler for CUDA
    Tabuchi, Akihiro
    Nakao, Masahiro
    Sato, Mitsuhisa
    [J]. EURO-PAR 2013: PARALLEL PROCESSING WORKSHOPS, 2014, 8374 : 178 - 187
  • [10] accULL: An OpenACC Implementation with CUDA and OpenCL Support
    Reyes, Ruyman
    Lopez-Rodriguez, Ivan
    Fumero, Juan J.
    de Sande, Francisco
    [J]. EURO-PAR 2012 PARALLEL PROCESSING, 2012, 7484 : 871 - 882