CUDA-quicksort: an improved GPU-based implementation of quicksort

被引:11
|
作者
Manca, Emanuele [1 ]
Manconi, Andrea [2 ]
Orro, Alessandro [2 ]
Armano, Giuliano [1 ]
Milanesi, Luciano [2 ]
机构
[1] Univ Cagliari, Dept Elect & Elect Engn, I-09123 Cagliari, Italy
[2] CNR, Inst Biomed Technol, I-20090 Segrate, MI, Italy
来源
关键词
high performance computing; GPU; CUDA; quick sort;
D O I
10.1002/cpe.3611
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Sorting is a very important task in computer science and becomes a critical operation for programs making heavy use of sorting algorithms. General-purpose computing has been successfully used on Graphics Processing Units (GPUs) to parallelize some sorting algorithms. Two GPU-based implementations of the quicksort were presented in literature: the GPU-quicksort, a compute-unified device architecture (CUDA) iterative implementation, and the CUDA dynamic parallel (CDP) quicksort, a recursive implementation provided by NVIDIA Corporation. We propose CUDA-quicksort an iterative GPU-based implementation of the sorting algorithm. CUDA-quicksort has been designed starting from GPU-quicksort. Unlike GPU-quicksort, it uses atomic primitives to perform inter-block communications while ensuring an optimized access to the GPU memory. Experiments performed on six sorting benchmark distributions show that CUDA-quicksort is up to four times faster than GPU-quicksort and up to three times faster than CDP-quicksort. An in-depth analysis of the performance between CUDA-quicksort and GPU-quicksort shows that the main improvement is related to the optimized GPU memory access rather than to the use of atomic primitives. Moreover, in order to assess the advantages of using the CUDA dynamic parallelism, we implemented a recursive version of the CUDA-quicksort. Experimental results show that CUDA-quicksort is faster than the CDP-quicksort provided by NVIDIA, with better performance achieved using the iterative implementation. Copyright (c) 2015 John Wiley & Sons, Ltd.
引用
下载
收藏
页码:21 / 43
页数:23
相关论文
共 50 条
  • [31] An Improved GPU-Based SAT Model Counter
    Fichte, Johannes K.
    Hecher, Markus
    Zisser, Markus
    PRINCIPLES AND PRACTICE OF CONSTRAINT PROGRAMMING, CP 2019, 2019, 11802 : 491 - 509
  • [32] The development of GPU-based parallel PRNG for Monte Carlo applications in CUDA Fortran
    Kargaran, Hamed
    Minuchehr, Abdolhamid
    Zolfaghari, Ahmad
    AIP ADVANCES, 2016, 6 (04)
  • [33] GPU-Based acceleration of an automatic white matter segmentation algorithm using CUDA
    Labra, Nicole
    Figueroa, Miguel
    Guevara, Pamela
    Duclap, Delphine
    Hoeunou, Josselin
    Poupon, Cyril
    Mangin, Jean-Francois
    2013 35TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2013, : 89 - 92
  • [34] QuickSort: improved right-tail asymptotics for the limiting distribution, and large deviations
    Fill, James Allen
    Hung, Wei-Chun
    ELECTRONIC JOURNAL OF PROBABILITY, 2019, 24 : 1 - 13
  • [35] A Survey on GPU-Based Implementation of Swarm Intelligence Algorithms
    Tan, Ying
    Ding, Ke
    IEEE TRANSACTIONS ON CYBERNETICS, 2016, 46 (09) : 2028 - 2041
  • [36] A GPU-BASED IMPLEMENTATION ON SUPER-RESOLUTION RECONSTRUCTION
    Wang, Kai
    Wang, Lifu
    Lu, Jian
    Sun, Yi
    Zhao, Shuping
    2012 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP 2012), 2012, : 849 - 852
  • [37] A GPU-based implementation of the MRF algorithm in ITK package
    Pedro Valero
    José L. Sánchez
    Diego Cazorla
    Enrique Arias
    The Journal of Supercomputing, 2011, 58 : 403 - 410
  • [38] A FAST GPU-BASED IMPLEMENTATION OF AN SUPERPOSITION/CONVOLUTION ALGORITHM
    Diez-Domingo, S.
    Reinado, D.
    Cortina, T.
    Cazorla, D.
    Sanchez, J. L.
    Alonso, S.
    Ricos, B.
    Gonzalez, R.
    RADIOTHERAPY AND ONCOLOGY, 2010, 96 : S479 - S480
  • [39] A Fast and Generic GPU-Based Parallel Reduction Implementation
    Rfaei Jradi, Walid Abdala
    Dantas do Nascimento, Hugo Alexandre
    Martins, Wellington Santos
    2018 SYMPOSIUM ON HIGH PERFORMANCE COMPUTING SYSTEMS (WSCAD 2018), 2018, : 16 - 22
  • [40] Radial Basis Function Networks GPU-Based Implementation
    Brandstetter, Andreas
    Artusi, Alessandro
    IEEE TRANSACTIONS ON NEURAL NETWORKS, 2008, 19 (12): : 2150 - 2154