Comparing Energy Efficiency of CPU, GPU and FPGA Implementations for Vision Kernels

被引：109

作者：

Qasaimeh, Murad ^{[1
]}

Denolf, Kristof ^{[2
]}

Lo, Jack ^{[2
]}

Vissers, Kees ^{[2
]}

Zambreno, Joseph ^{[1
]}

Jones, Phillip H. ^{[1
]}

机构：

[1] Iowa State Univ, Ames, IA 50011 USA

[2] Xilinx Res Labs, San Jose, CA USA

来源：

2019 IEEE INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS (ICESS) | 2019年

关键词：

Embedded Vision; GPUs; FPGAs; CPUs; Energy Efficiency;

D O I：

10.1109/icess.2019.8782524

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Developing high performance embedded vision applications requires balancing run-time performance with energy constraints. Given the mix of hardware accelerators that exist for embedded computer vision (e.g. multi-core CPUs, GPUs, and FPGAs), and their associated vendor optimized vision libraries, it becomes a challenge for developers to navigate this fragmented solution space. To aid with determining which embedded platform is most suitable for their application, we conduct a comprehensive benchmark of the run-time performance and energy efficiency of a wide range of vision kernels. We discuss rationales for why a given underlying hardware architecture innately performs well or poorly based on the characteristics of a range of vision kernel categories. Specifically, our study is performed for three commonly used HW accelerators for embedded vision applications: ARM57 CPU, Jetson TX2 GPU and ZCU102 FPGA, using their vendor optimized vision libraries: OpenCV, VisionWorks and xfOpenCV. Our results show that the GPU achieves an energy/frame reduction ratio of 1.1-3.2x compared to the others for simple kernels. While for more complicated kernels and complete vision pipelines, the FPGA outperforms the others with energy/frame reduction ratios of 1.2-22.3x. It is also observed that the FPGA performs increasingly better as a vision application's pipeline complexity grows.

引用

页数：8

共 50 条

[21] ESIMD GPU Implementations of Deep Learning Sparse Matrix Kernels
Zubair, Mohammad
Bauinger, Christoph
EURO-PAR 2024: PARALLEL PROCESSING, PT I, EURO-PAR 2024, 2024, 14801 : 33 - 46
[22] High Throughput Implementations of Cryptography algorithms on GPU and FPGA
Venugopal, Vivek
Shila, Devu Manikantan
2013 IEEE INTERNATIONAL INSTRUMENTATION AND MEASUREMENT TECHNOLOGY CONFERENCE (I2MTC), 2013, : 723 - 727
[23] Cooperative CPU, GPU, and FPGA heterogeneous execution with EngineCL
María Angélica Dávila Guzmán
Raúl Nozal
Rubén Gran Tejero
María Villarroya-Gaudó
Darío Suárez Gracia
Jose Luis Bosque
The Journal of Supercomputing, 2019, 75 : 1732 - 1746
[24] Accelerating Robot Dynamics Gradients on a CPU, GPU, and FPGA
Plancher, Brian
Neuman, Sabrina M.
Bourgeat, Thomas
Kuindersma, Scott
Devadas, Srinivas
Reddi, Vijay Janapa
IEEE ROBOTICS AND AUTOMATION LETTERS, 2021, 6 (02): : 2335 - 2342
[25] Performance of CPU/GPU compiler directives on ISO/TTI kernels
Sayan Ghosh
Terrence Liao
Henri Calandra
Barbara M. Chapman
Computing, 2014, 96 : 1149 - 1162
[26] Performance of CPU/GPU compiler directives on ISO/TTI kernels
Ghosh, Sayan
Liao, Terrence
Calandra, Henri
Chapman, Barbara M.
COMPUTING, 2014, 96 (12) : 1149 - 1162
[27] Cooperative CPU, GPU, and FPGA heterogeneous execution with EngineCL
Davila Guzman, Maria Angelica
Nozal, Raul
Gran Tejero, Ruben
Villarroya-Gaudo, Maria
Suarez Gracia, Dario
Luis Bosque, Jose
JOURNAL OF SUPERCOMPUTING, 2019, 75 (03): : 1732 - 1746
[28] C to Cellular Automata and Execution on CPU, GPU and FPGA
Drieseberg, Jens
Siemers, Christian
JOURNAL OF CELLULAR AUTOMATA, 2016, 11 (01) : 7 - 20
[29] PERFORMANCE COMPARISON OF FPGA, GPU AND CPU IN IMAGE PROCESSING
Asano, Shuichi
Maruyama, Tsutomu
Yamaguchi, Yoshiki
FPL: 2009 INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS, 2009, : 126 - 131
[30] Optimization strategies for CPU and GPU implementations of a smoothed particle hydrodynamics method
Dominguez, Jose M.
Crespo, Alejandro J. C.
Gomez-Gesteira, Moncho
COMPUTER PHYSICS COMMUNICATIONS, 2013, 184 (03) : 617 - 627

← 1 2 3 4 5 →