GPU Fast Convolution via the Overlap-and-Save Method in Shared Memory

被引:4
|
作者
Adamek, Karel [1 ]
Dimoudi, Sofia [2 ,4 ]
Giles, Mike [3 ]
Armour, Wesley [1 ]
机构
[1] Univ Oxford, Oxford E Res Ctr, Dept Engn Sci, 7 Keble Rd, Oxford OX1 3QG, England
[2] Univ Durham, Ctr Adv Instrumentat, Durham, England
[3] Univ Oxford, Math Inst, Andrew Wiles Bldg,Radcliffe Observ Quarter 550, Oxford OX2 6GG, England
[4] Ctr Adv Instrumentat, Dept Phys, Sci Labs, South Rd, Durham DH1 3LE, England
基金
英国工程与自然科学研究理事会;
关键词
Fast convolution; CUDA; GPU; overlap-and-save; FFT; ALGORITHMS;
D O I
10.1145/3394116
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We present an implementation of the overlap-and-save method, a method for the convolution of very long signals with short response functions, which is tailored to GPUs. We have implemented several FFT algorithms (using the CUDA programming language), which exploit GPU shared memory, allowing for GPU accelerated convolution. We compare our implementation with an implementation of the overlap-and-save algorithm utilizing the NVIDIA FFT library (cuFFT). We demonstrate that by using a shared-memory-based FFT, we can achieved significant speed-ups for certain problem sizes and lower the memory requirements of the overlap-and-save method on GPUs.
引用
收藏
页数:20
相关论文
共 23 条
  • [21] Design and Implementation of FPGA based Digital Pulse Compression via fast convolution using FFT-OS method
    Thakur, Vikram
    Verma, Amit Kumar
    Jena, Paramananda
    Prasad, G. Surya
    2015 INTERNATIONAL CONFERENCE ON MICROWAVE, OPTICAL AND COMMUNICATION ENGINEERING (ICMOCE), 2015, : 455 - 458
  • [22] Technical note: A GPU-based shared Monte Carlo method for fast photon transport in multi-energy x-ray exposures
    Zhou, Yiwen
    Deng, Wenxin
    Kang, Jing
    Xia, Jinqiu
    Yang, Yingjie
    Li, Bin
    Zhang, Yuqin
    Qi, Hongliang
    Wu, WangJiang
    Qi, Mengke
    Zhou, Linghong
    Ma, Jianhui
    Xu, Yuan
    MEDICAL PHYSICS, 2024, 51 (11) : 8390 - 8398
  • [23] Implicit discrete ordinates discontinuous Galerkin method for radiation problems on shared-memory multicore CPU/many-core GPU computation architecture
    Xu, Xiao
    NUMERICAL HEAT TRANSFER PART B-FUNDAMENTALS, 2021, 79 (04) : 165 - 188