CUDA-quicksort: an improved GPU-based implementation of quicksort

被引:11
|
作者
Manca, Emanuele [1 ]
Manconi, Andrea [2 ]
Orro, Alessandro [2 ]
Armano, Giuliano [1 ]
Milanesi, Luciano [2 ]
机构
[1] Univ Cagliari, Dept Elect & Elect Engn, I-09123 Cagliari, Italy
[2] CNR, Inst Biomed Technol, I-20090 Segrate, MI, Italy
来源
关键词
high performance computing; GPU; CUDA; quick sort;
D O I
10.1002/cpe.3611
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Sorting is a very important task in computer science and becomes a critical operation for programs making heavy use of sorting algorithms. General-purpose computing has been successfully used on Graphics Processing Units (GPUs) to parallelize some sorting algorithms. Two GPU-based implementations of the quicksort were presented in literature: the GPU-quicksort, a compute-unified device architecture (CUDA) iterative implementation, and the CUDA dynamic parallel (CDP) quicksort, a recursive implementation provided by NVIDIA Corporation. We propose CUDA-quicksort an iterative GPU-based implementation of the sorting algorithm. CUDA-quicksort has been designed starting from GPU-quicksort. Unlike GPU-quicksort, it uses atomic primitives to perform inter-block communications while ensuring an optimized access to the GPU memory. Experiments performed on six sorting benchmark distributions show that CUDA-quicksort is up to four times faster than GPU-quicksort and up to three times faster than CDP-quicksort. An in-depth analysis of the performance between CUDA-quicksort and GPU-quicksort shows that the main improvement is related to the optimized GPU memory access rather than to the use of atomic primitives. Moreover, in order to assess the advantages of using the CUDA dynamic parallelism, we implemented a recursive version of the CUDA-quicksort. Experimental results show that CUDA-quicksort is faster than the CDP-quicksort provided by NVIDIA, with better performance achieved using the iterative implementation. Copyright (c) 2015 John Wiley & Sons, Ltd.
引用
下载
收藏
页码:21 / 43
页数:23
相关论文
共 50 条
  • [11] Increasing the robustness of CUDA Fermi GPU-based systems
    Di Carlo, Stefano
    Gambardella, Giulio
    Indaco, Marco
    Martella, Ippazio
    Prinetto, Paolo
    Rolfo, Daniele
    Trotta, Pascal
    PROCEEDINGS OF THE 2013 IEEE 19TH INTERNATIONAL ON-LINE TESTING SYMPOSIUM (IOLTS), 2013, : 234 - 235
  • [12] A GPU-Based Implementation of ADMIRE
    Khan, Christopher
    Dei, Kazuyuki
    Byram, Brett
    2019 IEEE INTERNATIONAL ULTRASONICS SYMPOSIUM (IUS), 2019, : 1501 - 1504
  • [13] A CLASS OF SORTING ALGORITHMS BASED ON QUICKSORT
    LAGALLY, K
    ZIEGLER, B
    COMMUNICATIONS OF THE ACM, 1986, 29 (04) : 333 - 334
  • [14] A CLASS OF SORTING ALGORITHMS BASED ON QUICKSORT
    WAINWRIGHT, RL
    COMMUNICATIONS OF THE ACM, 1985, 28 (04) : 396 - 402
  • [15] The Design and Implementation of an Improved Lightweight BLASTP on CUDA GPU
    Sun, Xue
    Wu, Chao-Chin
    Liu, Yan-Fang
    SYMMETRY-BASEL, 2021, 13 (12):
  • [16] Implementation of a GPU-based CFD code
    Niksiar, Pooya
    Ashrafizadeh, Ali
    Shams, Mehrzad
    Madani, Amir Hossein
    2014 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE (CSCI), VOL 1, 2014, : 84 - 89
  • [17] GPU-based Implementation of Reverb Effect
    Nikolov, Dusan V.
    Misic, Marko J.
    Tomasevic, Milo V.
    2015 23RD TELECOMMUNICATIONS FORUM TELFOR (TELFOR), 2015, : 990 - 993
  • [18] A GPU-Based Parallel Reduction Implementation
    Rfaei Jradi, Walid Abdala
    Dantas do Nascimento, Hugo Alexandre
    Martins, Wellington Santos
    HIGH PERFORMANCE COMPUTING SYSTEMS, WSCAD 2018, 2020, 1171 : 168 - 182
  • [19] Time Sharing Based Multithreading approach to Quicksort
    Guliani, Gurkirat Singh
    Bagga, Rajat
    2017 3RD IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE & COMMUNICATION TECHNOLOGY (CICT), 2017,
  • [20] Implementation of CUDA GPU-Based Parallel Computing on Smith-Waterman Algorithm to Sequence Database Searches
    Bustamam, Alhadi
    Ardaneswari, Gianinna
    Lestari, Dian
    2013 INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER SCIENCE AND INFORMATION SYSTEMS (ICACSIS), 2013, : 137 - 142