Exploiting Parallelism in the Simulation of General Purpose Graphics Processing Unit Program

被引:0
|
作者
赵夏 [1 ,2 ]
马胜 [1 ,2 ]
陈微 [1 ,2 ]
王志英 [1 ,2 ]
机构
[1] State Key Laboratory of High Performance Computing
[2] College of Computer,National University of Defense Technology
基金
国家教育部博士点专项基金资助; 高等学校博士学科点专项科研基金; 中国国家自然科学基金;
关键词
general purpose graphics processing unit(GPGPU); multicore; intra-kernel; inter-kernel; parallel;
D O I
暂无
中图分类号
TP391.41 [];
学科分类号
080203 ;
摘要
The simulation is an important means of performance evaluation of the computer architecture. Nowadays, the serial simulation of general purpose graphics processing unit(GPGPU) architecture is the main bottleneck for the simulation speed. To address this issue, we propose the intra-kernel parallelization on a multicore processor and the inter-kernel parallelization on a multiple-machine platform. We apply these two methods to the GPGPU-sim simulator. The intra-kernel parallelization method firstly parallelizes the serial simulation of multiple compute units in one cycle. Then it parallelizes the timing and functional simulation to reduce the performance loss caused by the synchronization between different compute units. The inter-kernel parallelization method divides multiple kernels of a CUDA program into several groups and distributes these groups across multiple simulation hosts to perform the simulation. Experimental results show that the intra-kernel parallelization method achieves a speed-up of up to 12 with a maximum error rate of 0.009 4% on a 32-core machine, and the inter-kernel parallelization method can accelerate the simulation by a factor of up to 3.9 with a maximum error rate of 0.11% on four simulation hosts. The orthogonality between these two methods allows us to combine them together on multiple multi-core hosts to get further performance improvements.
引用
收藏
页码:280 / 288
页数:9
相关论文
共 50 条
  • [1] Exploiting parallelism in the simulation of general purpose graphics processing unit program
    Zhao X.
    Ma S.
    Chen W.
    Wang Z.
    Journal of Shanghai Jiaotong University (Science), 2016, 21 (03) : 280 - 288
  • [2] General purpose computing of graphics processing unit: A survey
    Wang, Hai-Feng
    Chen, Qing-Kui
    Jisuanji Xuebao/Chinese Journal of Computers, 2013, 36 (04): : 757 - 772
  • [3] Exploiting parallelism in general purpose optimization
    Venter, G
    Watson, B
    APPLICATIONS OF HIGH-PERFORMANCE COMPUTING IN ENGINEERING VI, 2000, 6 : 21 - 30
  • [4] Parallel simulation for a fish schooling model on a general-purpose graphics processing unit
    Li, Hong
    Kolpas, Allison
    Petzold, Linda
    Moehlis, Jeff
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2009, 21 (06): : 725 - 737
  • [5] MPIE/MoM Acceleration With a General-Purpose Graphics Processing Unit
    De Donno, Danilo
    Esposito, Alessandra
    Monti, Giuseppina
    Tarricone, Luciano
    IEEE TRANSACTIONS ON MICROWAVE THEORY AND TECHNIQUES, 2012, 60 (09) : 2693 - 2701
  • [6] Implementation and performance of a general purpose graphics processing unit in hyperspectral image analysis
    van der Werff, H. M. A.
    Bakker, W. H.
    INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2014, 26 : 312 - 321
  • [7] State of the art and future challenge on general purpose computation by graphics processing unit
    Wu, En-Hua
    Ruan Jian Xue Bao/Journal of Software, 2004, 15 (10): : 1493 - 1504
  • [8] Exploiting parallelism in geometry processing with general purpose processors and floating-point SIMD instructions
    Yang, CL
    Sano, B
    Lebeck, AR
    IEEE TRANSACTIONS ON COMPUTERS, 2000, 49 (09) : 934 - 946
  • [9] PARALLEL IMPLEMENTATION OF AN ERROR DIFFUSION HALFTONING ALGORITHM WITH A GENERAL PURPOSE GRAPHICS PROCESSING UNIT
    Seong, Becksang
    Ahn, Jaewoo
    Sung, Wonyong
    2010 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, 2010, : 3777 - 3780
  • [10] MASSIVELY PARALLEL IMPLEMENTATION OF CYCLIC LDPC CODES ON A GENERAL PURPOSE GRAPHICS PROCESSING UNIT
    Ji, Hyunwoo
    Cho, Junho
    Sung, Wonyong
    SIPS: 2009 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS, 2009, : 285 - 290