Accelerated parametric chamfer alignment using a parallel, pipelined GPU realization

被引：0

作者：

Ahmed Elliethy

Gaurav Sharma

机构：

[1] University of Rochester,Department of Electrical and Computer Engineering

来源：

Journal of Real-Time Image Processing | 2019年 / 16卷

关键词：

Chamfer alignment; Pipelining; Parametric registration; GPU acceleration;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Parametric chamfer alignment (PChA) is commonly employed for aligning an observed set of points with a corresponding set of reference points. PChA estimates optimal geometric transformation parameters that minimize an objective function formulated as the sum of the squared distances from each transformed observed point to its closest reference point. A distance transform enables efficient computation of the (squared) distances, and the objective function minimization is commonly performed via the Levenberg–Marquardt (LM) nonlinear least squares iterative optimization algorithm. The point-wise computations of the objective function, gradient, and Hessian approximation required for the LM iterations make PChA computationally demanding for large-scale datasets. We propose an acceleration of the PChA via a parallelized and pipelined realization that is particularly well suited for large-scale datasets and for modern GPU architectures. Specifically, we partition the observed points among the GPU blocks and decompose the expensive LM calculations in correspondence with the GPU’s single-instruction multiple-thread architecture to significantly speed up this bottleneck step for PChA on large-scale datasets. Additionally, by reordering computations, we propose a novel pipelining of the LM algorithm that offers further speedup by exploiting the low arithmetic latency of the GPU compared with its high global memory access latency. Results obtained on two different platforms for both 2D and 3D large-scale point datasets from our ongoing research demonstrate that the proposed PChA GPU implementation provides a significant speedup over its single CPU counterpart.

引用

页码：1661 / 1680

页数：19

共 50 条

[31] A GPU Accelerated Parallel Heuristic for Travelling Salesman Problem
Rashid, Mohammad Harun
2018 19TH IEEE/ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING (SNPD), 2018, : 82 - 86
[32] New approach to scalable parallel and pipelined realization of repetitive multiple accumulations
Meher, Pramod Kumar
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2008, 55 (09) : 902 - 906
[33] A GPU Parallel Algorithm for Non Parametric Tensor Learning
Turchetti, Claudio
Falaschetti, Laura
2018 IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (ISSPIT), 2018, : 286 - 290
[34] Reduction of augmentation order in pipelined realization of IIR filters by cascade/parallel form
Boonyanant, Phakphoom
Tantaratana, Sawasd
IEEE Asia-Pacific Conference on Circuits and Systems - Proceedings, 1998, : 77 - 80
[35] GPU-accelerated image alignment for object detection in industrial applications
Le, Trung-Son
Lin, Chyi-Yeu
2017 INTERNATIONAL CONFERENCE ON ADVANCED ROBOTICS AND INTELLIGENT SYSTEMS (ARIS), 2017, : 13 - 16
[36] Accelerated Stereoscopic Rendering using GPU
de Sorbier, Francois
Nozick, Vincent
Biri, Venceslas
WSCG 2008, COMMUNICATION PAPERS, 2008, : 239 - +
[37] GPU-Accelerated Parallel FDTD on Distributed Heterogeneous Platform
Jiang, Ronglin
Jiang, Shugang
Zhang, Yu
Xu, Ying
Xu, Lei
Zhang, Dandan
INTERNATIONAL JOURNAL OF ANTENNAS AND PROPAGATION, 2014, 2014
[38] Efficient GPU-accelerated parallel cross-correlation
Madera, Karel
Smelko, Adam
Krulis, Martin
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2025, 199
[39] A GPU-accelerated parallel K-means algorithm
Cuomo, S.
De Angelis, V.
Farina, G.
Marcellino, L.
Toraldo, G.
COMPUTERS & ELECTRICAL ENGINEERING, 2019, 75 : 262 - 274
[40] Reduction of augmentation order in pipelined realization of IIR filters by cascade/parallel form
Boonyanant, P
Tantaratana, S
APCCAS '98 - IEEE ASIA-PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS: MICROELECTRONICS AND INTEGRATING SYSTEMS, 1998, : 77 - 80

← 1 2 3 4 5 →