CUDA-quicksort: an improved GPU-based implementation of quicksort

被引:11
|
作者
Manca, Emanuele [1 ]
Manconi, Andrea [2 ]
Orro, Alessandro [2 ]
Armano, Giuliano [1 ]
Milanesi, Luciano [2 ]
机构
[1] Univ Cagliari, Dept Elect & Elect Engn, I-09123 Cagliari, Italy
[2] CNR, Inst Biomed Technol, I-20090 Segrate, MI, Italy
来源
关键词
high performance computing; GPU; CUDA; quick sort;
D O I
10.1002/cpe.3611
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Sorting is a very important task in computer science and becomes a critical operation for programs making heavy use of sorting algorithms. General-purpose computing has been successfully used on Graphics Processing Units (GPUs) to parallelize some sorting algorithms. Two GPU-based implementations of the quicksort were presented in literature: the GPU-quicksort, a compute-unified device architecture (CUDA) iterative implementation, and the CUDA dynamic parallel (CDP) quicksort, a recursive implementation provided by NVIDIA Corporation. We propose CUDA-quicksort an iterative GPU-based implementation of the sorting algorithm. CUDA-quicksort has been designed starting from GPU-quicksort. Unlike GPU-quicksort, it uses atomic primitives to perform inter-block communications while ensuring an optimized access to the GPU memory. Experiments performed on six sorting benchmark distributions show that CUDA-quicksort is up to four times faster than GPU-quicksort and up to three times faster than CDP-quicksort. An in-depth analysis of the performance between CUDA-quicksort and GPU-quicksort shows that the main improvement is related to the optimized GPU memory access rather than to the use of atomic primitives. Moreover, in order to assess the advantages of using the CUDA dynamic parallelism, we implemented a recursive version of the CUDA-quicksort. Experimental results show that CUDA-quicksort is faster than the CDP-quicksort provided by NVIDIA, with better performance achieved using the iterative implementation. Copyright (c) 2015 John Wiley & Sons, Ltd.
引用
下载
收藏
页码:21 / 43
页数:23
相关论文
共 50 条
  • [21] GPU-based parallel computation for structural dynamic response analysis with CUDA
    Kang, Dong-Keun
    Kim, Chang-Wan
    Yang, Hyun-Ik
    JOURNAL OF MECHANICAL SCIENCE AND TECHNOLOGY, 2014, 28 (10) : 4155 - 4162
  • [22] GPU-based parallel computation for structural dynamic response analysis with CUDA
    Dong-Keun Kang
    Chang-Wan Kim
    Hyun-Ik Yang
    Journal of Mechanical Science and Technology, 2014, 28 : 4155 - 4162
  • [23] Hypergraph Partitioning Implementation for Parallelizing Matrix-Vector Multiplication Using CUDA GPU-Based Parallel Computing
    Murni
    Bustamam, A.
    Ernastuti
    Handhika, T.
    Kerami, D.
    INTERNATIONAL SYMPOSIUM ON CURRENT PROGRESS IN MATHEMATICS AND SCIENCES 2016 (ISCPMS 2016), 2017, 1862
  • [24] CAVLCU: an efficient GPU-based implementation of CAVLC
    Fuentes-Alventosa, Antonio
    Gomez-Luna, Juan
    Maria Gonzalez-Linares, Jose
    Guil, Nicolas
    Medina-Carnicer, R.
    JOURNAL OF SUPERCOMPUTING, 2022, 78 (06): : 7556 - 7590
  • [25] A GPU-based Implementation of an Enhanced GEP Algorithm
    Shao, Shuai
    Liu, Xiyang
    Zhou, Mingyuan
    Zhan, Jiguo
    Liu, Xin
    Chu, Yanli
    Chen, Hao
    PROCEEDINGS OF THE FOURTEENTH INTERNATIONAL CONFERENCE ON GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, 2012, : 999 - 1006
  • [26] GPU-based Parallel Implementation of SAR Imaging
    Jin, Xingxing
    Ko, Seok-Bum
    2012 INTERNATIONAL SYMPOSIUM ON ELECTRONIC SYSTEM DESIGN (ISED 2012), 2012, : 125 - 129
  • [27] CAVLCU: an efficient GPU-based implementation of CAVLC
    Antonio Fuentes-Alventosa
    Juan Gómez-Luna
    José Maria González-Linares
    Nicolás Guil
    R. Medina-Carnicer
    The Journal of Supercomputing, 2022, 78 : 7556 - 7590
  • [28] Towards a GPU-based implementation of interaction nets
    Jiresch, Eugen
    ELECTRONIC PROCEEDINGS IN THEORETICAL COMPUTER SCIENCE, 2014, (143): : 41 - 53
  • [29] A GPU-based Implementation of Brain Storm Optimization
    Jin, Chen
    Qin, A. K.
    2017 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2017, : 2698 - 2705
  • [30] A GPU-based Implementation of a Sensor Tasking Methodology
    Abusultan, M.
    Chakravorty, S.
    Khatri, S. P.
    2016 19TH INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION), 2016, : 1398 - 1405