CUDA-quicksort: an improved GPU-based implementation of quicksort

被引:11
|
作者
Manca, Emanuele [1 ]
Manconi, Andrea [2 ]
Orro, Alessandro [2 ]
Armano, Giuliano [1 ]
Milanesi, Luciano [2 ]
机构
[1] Univ Cagliari, Dept Elect & Elect Engn, I-09123 Cagliari, Italy
[2] CNR, Inst Biomed Technol, I-20090 Segrate, MI, Italy
来源
关键词
high performance computing; GPU; CUDA; quick sort;
D O I
10.1002/cpe.3611
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Sorting is a very important task in computer science and becomes a critical operation for programs making heavy use of sorting algorithms. General-purpose computing has been successfully used on Graphics Processing Units (GPUs) to parallelize some sorting algorithms. Two GPU-based implementations of the quicksort were presented in literature: the GPU-quicksort, a compute-unified device architecture (CUDA) iterative implementation, and the CUDA dynamic parallel (CDP) quicksort, a recursive implementation provided by NVIDIA Corporation. We propose CUDA-quicksort an iterative GPU-based implementation of the sorting algorithm. CUDA-quicksort has been designed starting from GPU-quicksort. Unlike GPU-quicksort, it uses atomic primitives to perform inter-block communications while ensuring an optimized access to the GPU memory. Experiments performed on six sorting benchmark distributions show that CUDA-quicksort is up to four times faster than GPU-quicksort and up to three times faster than CDP-quicksort. An in-depth analysis of the performance between CUDA-quicksort and GPU-quicksort shows that the main improvement is related to the optimized GPU memory access rather than to the use of atomic primitives. Moreover, in order to assess the advantages of using the CUDA dynamic parallelism, we implemented a recursive version of the CUDA-quicksort. Experimental results show that CUDA-quicksort is faster than the CDP-quicksort provided by NVIDIA, with better performance achieved using the iterative implementation. Copyright (c) 2015 John Wiley & Sons, Ltd.
引用
收藏
页码:21 / 43
页数:23
相关论文
共 50 条
  • [1] A faster implementation of quicksort
    Al-A'Ali, Mansoor
    [J]. WSEAS Transactions on Information Science and Applications, 2007, 4 (01): : 230 - 235
  • [2] IMPLEMENTATION OF QUICKSORT IN COBOL
    HILDEBRAND, K
    [J]. ANGEWANDTE INFORMATIK, 1989, (01): : 14 - 18
  • [3] A PARALLEL IMPLEMENTATION STRATEGY FOR QUICKSORT
    SHARP, D
    CRIPPS, M
    [J]. CA-DSP 89, VOLS 1 AND 2: 1989 INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND DIGITAL SIGNAL PROCESSING, 1989, : 305 - 309
  • [4] GPU-based implementation of finite element method for elasticity using CUDA
    Zhang, Jianfei
    Shen, Defei
    [J]. 2013 IEEE 15TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2013 IEEE INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING (HPCC_EUC), 2013, : 1003 - 1008
  • [5] Case Study: GPU-based Implementation of Sequence Pair Based Floorplanning Using CUDA
    Choi, Won Ha
    Liu, Xun
    [J]. 2010 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, 2010, : 917 - 920
  • [6] Fast Quicksort Implementation Using AVX Instructions
    Gueron, Shay
    Krasnov, Vlad
    [J]. COMPUTER JOURNAL, 2016, 59 (01): : 83 - 90
  • [7] Fast GPU-based Adaptive Tessellation with CUDA
    Schwarz, Michael
    Stamminger, Marc
    [J]. COMPUTER GRAPHICS FORUM, 2009, 28 (02) : 365 - 374
  • [8] An Improved CUDA-Based Implementation of Differential Evolution on GPU
    Qin, A. K.
    Raimondo, Federico
    Forbes, Florence
    Ong, Yew Soon
    [J]. PROCEEDINGS OF THE FOURTEENTH INTERNATIONAL CONFERENCE ON GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, 2012, : 991 - 998
  • [9] A CUDA-based implementation of an improved SPH method on GPU
    Antonelli, L.
    Francomano, E.
    Gregoretti, F.
    [J]. APPLIED MATHEMATICS AND COMPUTATION, 2021, 409
  • [10] Design and Implementation of a CUDA-Compatible GPU-based Core for Gapped BLAST Algorithm
    Ling, Cheng
    Benkrid, Khaled
    [J]. ICCS 2010 - INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE, PROCEEDINGS, 2010, 1 (01): : 495 - 504