cuRCD: Region covariance descriptor CUDA implementation

被引：0

作者：

M. Ali Asan

Adnan Ozsoy

机构：

[1] Hacettepe University,Computer Engineering Department

来源：

Multimedia Tools and Applications | 2021年 / 80卷

关键词：

Parallel region covariance; CUDA; Real time object detection; GPGPU;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Abstract Region covariance is a robust feature descriptor that allows the use of even the simplest image features like intensity and gradient combined to form a well-performing descriptor for regions on the image. Beyond its robustness, it requires many identical heavy computations on different parts of input data which makes it a good candidate for parallel execution. In this manuscript, we present a real-time parallel implementation of the region covariance which, to our best knowledge, is the first in the literature. We experimented against existing implementations and achieved 6 times faster execution time over vectorized CPU parallel implementation that provides necessary speed up for real-time processing. Additionally, we improved the existing integral image calculation method on CUDA, reducing memory usage by 50%, achieving the fastest computation speed compared to exist- ing solutions, and improved the covariance matrix comparison metric by using a distance metric that is lightweight to compute and easy to implement.

引用

页码：19737 / 19751

页数：14

共 50 条

[41] A CUDA Implementation of the PageRank Pipeline Benchmark
Bisson, Mauro
Phillips, Everett
Fatica, Massimiliano
2016 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2016,
[42] The CUDA LATCH Binary Descriptor: Because Sometimes Faster Means Better
Parker, Christopher
Daiter, Matthew
Omar, Kareem
Levi, Gil
Hassner, Tal
COMPUTER VISION - ECCV 2016 WORKSHOPS, PT III, 2016, 9915 : 685 - 697
[43] A CUDA Implementation of DWT for JPEG 2000 Codec
Kurosaki, Masayuki
Matsuo, Masateru
Kuroki, Yoshimitsu
Nagao, Yuhei
Sai, Baiko
Ochi, Hiroshi
IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2011, E94A (11) : 2358 - 2360
[44] A CUDA Implementation of the Standard Particle Swarm Optimization
Hussain, Md. Maruf
Hattori, Hiroshi
Fujimoto, Noriyuki
PROCEEDINGS OF 2016 18TH INTERNATIONAL SYMPOSIUM ON SYMBOLIC AND NUMERIC ALGORITHMS FOR SCIENTIFIC COMPUTING (SYNASC), 2016, : 219 - 226
[45] A CUDA implementation of the Continuous Space Language Model
Thompson, Elizabeth A.
Anderson, Timothy R.
JOURNAL OF SUPERCOMPUTING, 2014, 68 (01): : 65 - 86
[46] CUDA accelerated implementation of parallel dynamic relaxation
Ivanyi, P.
ADVANCES IN ENGINEERING SOFTWARE, 2018, 125 : 200 - 208
[47] Implementation of a maximum clique search procedure on CUDA
Daniluk, Pawel
Firlik, Grzegorz
Lesyng, Bogdan
JOURNAL OF HEURISTICS, 2019, 25 (02) : 247 - 271
[48] A CUDA implementation of the Continuous Space Language Model
Elizabeth A. Thompson
Timothy R. Anderson
The Journal of Supercomputing, 2014, 68 : 65 - 86
[49] cuSCNN : an Efficient CUDA Implementation of Sparse CNNs
Elgammal, Mohamed A.
Awad, Omar M.
Vivancos, Isak Edo
Moshovos, Andreas
Betz, Vaughn
THE PROCEEDINGS OF THE 13TH INTERNATIONAL SYMPOSIUM ON HIGHLY EFFICIENT ACCELERATORS AND RECONFIGURABLE TECHNOLOGIES, HEART 2023, 2023, : 107 - 113
[50] A lightweight BLASTP and its implementation on CUDA GPUs
Liang-Tsung Huang
Kai-Cheng Wei
Chao-Chin Wu
Chao-Yu Chen
Jian-An Wang
The Journal of Supercomputing, 2021, 77 : 322 - 342

← 1 2 3 4 5 →