GPU Fast Convolution via the Overlap-and-Save Method in Shared Memory

被引:4
|
作者
Adamek, Karel [1 ]
Dimoudi, Sofia [2 ,4 ]
Giles, Mike [3 ]
Armour, Wesley [1 ]
机构
[1] Univ Oxford, Oxford E Res Ctr, Dept Engn Sci, 7 Keble Rd, Oxford OX1 3QG, England
[2] Univ Durham, Ctr Adv Instrumentat, Durham, England
[3] Univ Oxford, Math Inst, Andrew Wiles Bldg,Radcliffe Observ Quarter 550, Oxford OX2 6GG, England
[4] Ctr Adv Instrumentat, Dept Phys, Sci Labs, South Rd, Durham DH1 3LE, England
基金
英国工程与自然科学研究理事会;
关键词
Fast convolution; CUDA; GPU; overlap-and-save; FFT; ALGORITHMS;
D O I
10.1145/3394116
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We present an implementation of the overlap-and-save method, a method for the convolution of very long signals with short response functions, which is tailored to GPUs. We have implemented several FFT algorithms (using the CUDA programming language), which exploit GPU shared memory, allowing for GPU accelerated convolution. We compare our implementation with an implementation of the overlap-and-save algorithm utilizing the NVIDIA FFT library (cuFFT). We demonstrate that by using a shared-memory-based FFT, we can achieved significant speed-ups for certain problem sizes and lower the memory requirements of the overlap-and-save method on GPUs.
引用
收藏
页数:20
相关论文
共 23 条
  • [1] Fast filter bank convolution for three-dimensional wavelet transform by shared memory on mobile GPU computing
    Di Zhao
    The Journal of Supercomputing, 2015, 71 : 3440 - 3455
  • [2] Fast filter bank convolution for three-dimensional wavelet transform by shared memory on mobile GPU computing
    Zhao, Di
    JOURNAL OF SUPERCOMPUTING, 2015, 71 (09): : 3440 - 3455
  • [3] FAST CONVOLUTION KERNELS ON PASCAL GPU WITH HIGH MEMORY EFFICIENCY
    Chang, Qiong
    Onishi, Masaki
    Maruyama, Tsutomu
    HIGH PERFORMANCE COMPUTING SYMPOSIUM (HPC 2018), 2018, 50 (04):
  • [4] A New Overlap Save Algorithm for Fast Block Convolution and Its Implementation Using FFT
    Kuk, Jung Gap
    Kim, Seyun
    Cho, Nam Ik
    JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2011, 63 (01): : 143 - 152
  • [5] A New Overlap Save Algorithm for Fast Block Convolution and Its Implementation Using FFT
    Jung Gap Kuk
    Seyun Kim
    Nam Ik Cho
    Journal of Signal Processing Systems, 2011, 63 : 143 - 152
  • [6] A Fast GPU Convolution/Superposition Method for Radiotherapy Dose Calculation
    Carrasco, Diego
    Cappagli, Pablo
    Colavecchia, Flavio D.
    HIGH PERFORMANCE COMPUTING, 2018, 796 : 307 - 318
  • [7] Improved Algorithm of Overlap-Save Method for Calculating Linear Convolution of a Long Sequence
    Wan, Guo-feng
    Chen, Sheng-wei
    ADVANCES IN COMPUTER SCIENCE, ENVIRONMENT, ECOINFORMATICS, AND EDUCATION, PT III, 2011, 216 : 322 - +
  • [8] A FAST FREE MEMORY METHOD FOR AN EFFICIENT COMPUTATION OF CONVOLUTION KERNELS
    Aussal, Matthieu
    Bakry, Marc
    JOURNAL OF COMPUTATIONAL MATHEMATICS, 2023, 41 (06): : 1093 - 1116
  • [9] ADC-PIM: Accelerating Convolution on the GPU via In-Memory Approximate Data Comparison
    Choi, Jungwoo
    Lee, Hyuk-Jae
    Rhee, Chae Eun
    IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, 2022, 12 (02) : 458 - 471
  • [10] Memory-accelerated parallel method for multidimensional fast fourier implementation on GPU
    Yichang Hu
    Lu Lu
    Cuixu Li
    The Journal of Supercomputing, 2022, 78 : 18189 - 18208