cuRCD: Region covariance descriptor CUDA implementation

被引:0
|
作者
M. Ali Asan
Adnan Ozsoy
机构
[1] Hacettepe University,Computer Engineering Department
来源
关键词
Parallel region covariance; CUDA; Real time object detection; GPGPU;
D O I
暂无
中图分类号
学科分类号
摘要
Abstract Region covariance is a robust feature descriptor that allows the use of even the simplest image features like intensity and gradient combined to form a well-performing descriptor for regions on the image. Beyond its robustness, it requires many identical heavy computations on different parts of input data which makes it a good candidate for parallel execution. In this manuscript, we present a real-time parallel implementation of the region covariance which, to our best knowledge, is the first in the literature. We experimented against existing implementations and achieved 6 times faster execution time over vectorized CPU parallel implementation that provides necessary speed up for real-time processing. Additionally, we improved the existing integral image calculation method on CUDA, reducing memory usage by 50%, achieving the fastest computation speed compared to exist- ing solutions, and improved the covariance matrix comparison metric by using a distance metric that is lightweight to compute and easy to implement.
引用
收藏
页码:19737 / 19751
页数:14
相关论文
共 50 条
  • [41] A CUDA Implementation of the PageRank Pipeline Benchmark
    Bisson, Mauro
    Phillips, Everett
    Fatica, Massimiliano
    2016 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2016,
  • [42] The CUDA LATCH Binary Descriptor: Because Sometimes Faster Means Better
    Parker, Christopher
    Daiter, Matthew
    Omar, Kareem
    Levi, Gil
    Hassner, Tal
    COMPUTER VISION - ECCV 2016 WORKSHOPS, PT III, 2016, 9915 : 685 - 697
  • [43] A CUDA Implementation of DWT for JPEG 2000 Codec
    Kurosaki, Masayuki
    Matsuo, Masateru
    Kuroki, Yoshimitsu
    Nagao, Yuhei
    Sai, Baiko
    Ochi, Hiroshi
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2011, E94A (11) : 2358 - 2360
  • [44] A CUDA Implementation of the Standard Particle Swarm Optimization
    Hussain, Md. Maruf
    Hattori, Hiroshi
    Fujimoto, Noriyuki
    PROCEEDINGS OF 2016 18TH INTERNATIONAL SYMPOSIUM ON SYMBOLIC AND NUMERIC ALGORITHMS FOR SCIENTIFIC COMPUTING (SYNASC), 2016, : 219 - 226
  • [45] A CUDA implementation of the Continuous Space Language Model
    Thompson, Elizabeth A.
    Anderson, Timothy R.
    JOURNAL OF SUPERCOMPUTING, 2014, 68 (01): : 65 - 86
  • [46] CUDA accelerated implementation of parallel dynamic relaxation
    Ivanyi, P.
    ADVANCES IN ENGINEERING SOFTWARE, 2018, 125 : 200 - 208
  • [47] Implementation of a maximum clique search procedure on CUDA
    Daniluk, Pawel
    Firlik, Grzegorz
    Lesyng, Bogdan
    JOURNAL OF HEURISTICS, 2019, 25 (02) : 247 - 271
  • [48] A CUDA implementation of the Continuous Space Language Model
    Elizabeth A. Thompson
    Timothy R. Anderson
    The Journal of Supercomputing, 2014, 68 : 65 - 86
  • [49] cuSCNN : an Efficient CUDA Implementation of Sparse CNNs
    Elgammal, Mohamed A.
    Awad, Omar M.
    Vivancos, Isak Edo
    Moshovos, Andreas
    Betz, Vaughn
    THE PROCEEDINGS OF THE 13TH INTERNATIONAL SYMPOSIUM ON HIGHLY EFFICIENT ACCELERATORS AND RECONFIGURABLE TECHNOLOGIES, HEART 2023, 2023, : 107 - 113
  • [50] A lightweight BLASTP and its implementation on CUDA GPUs
    Liang-Tsung Huang
    Kai-Cheng Wei
    Chao-Chin Wu
    Chao-Yu Chen
    Jian-An Wang
    The Journal of Supercomputing, 2021, 77 : 322 - 342