Comparing Energy Efficiency of CPU, GPU and FPGA Implementations for Vision Kernels

被引:109
|
作者
Qasaimeh, Murad [1 ]
Denolf, Kristof [2 ]
Lo, Jack [2 ]
Vissers, Kees [2 ]
Zambreno, Joseph [1 ]
Jones, Phillip H. [1 ]
机构
[1] Iowa State Univ, Ames, IA 50011 USA
[2] Xilinx Res Labs, San Jose, CA USA
来源
2019 IEEE INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS (ICESS) | 2019年
关键词
Embedded Vision; GPUs; FPGAs; CPUs; Energy Efficiency;
D O I
10.1109/icess.2019.8782524
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Developing high performance embedded vision applications requires balancing run-time performance with energy constraints. Given the mix of hardware accelerators that exist for embedded computer vision (e.g. multi-core CPUs, GPUs, and FPGAs), and their associated vendor optimized vision libraries, it becomes a challenge for developers to navigate this fragmented solution space. To aid with determining which embedded platform is most suitable for their application, we conduct a comprehensive benchmark of the run-time performance and energy efficiency of a wide range of vision kernels. We discuss rationales for why a given underlying hardware architecture innately performs well or poorly based on the characteristics of a range of vision kernel categories. Specifically, our study is performed for three commonly used HW accelerators for embedded vision applications: ARM57 CPU, Jetson TX2 GPU and ZCU102 FPGA, using their vendor optimized vision libraries: OpenCV, VisionWorks and xfOpenCV. Our results show that the GPU achieves an energy/frame reduction ratio of 1.1-3.2x compared to the others for simple kernels. While for more complicated kernels and complete vision pipelines, the FPGA outperforms the others with energy/frame reduction ratios of 1.2-22.3x. It is also observed that the FPGA performs increasingly better as a vision application's pipeline complexity grows.
引用
收藏
页数:8
相关论文
共 50 条
  • [21] ESIMD GPU Implementations of Deep Learning Sparse Matrix Kernels
    Zubair, Mohammad
    Bauinger, Christoph
    EURO-PAR 2024: PARALLEL PROCESSING, PT I, EURO-PAR 2024, 2024, 14801 : 33 - 46
  • [22] High Throughput Implementations of Cryptography algorithms on GPU and FPGA
    Venugopal, Vivek
    Shila, Devu Manikantan
    2013 IEEE INTERNATIONAL INSTRUMENTATION AND MEASUREMENT TECHNOLOGY CONFERENCE (I2MTC), 2013, : 723 - 727
  • [23] Cooperative CPU, GPU, and FPGA heterogeneous execution with EngineCL
    María Angélica Dávila Guzmán
    Raúl Nozal
    Rubén Gran Tejero
    María Villarroya-Gaudó
    Darío Suárez Gracia
    Jose Luis Bosque
    The Journal of Supercomputing, 2019, 75 : 1732 - 1746
  • [24] Accelerating Robot Dynamics Gradients on a CPU, GPU, and FPGA
    Plancher, Brian
    Neuman, Sabrina M.
    Bourgeat, Thomas
    Kuindersma, Scott
    Devadas, Srinivas
    Reddi, Vijay Janapa
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2021, 6 (02): : 2335 - 2342
  • [25] Performance of CPU/GPU compiler directives on ISO/TTI kernels
    Sayan Ghosh
    Terrence Liao
    Henri Calandra
    Barbara M. Chapman
    Computing, 2014, 96 : 1149 - 1162
  • [26] Performance of CPU/GPU compiler directives on ISO/TTI kernels
    Ghosh, Sayan
    Liao, Terrence
    Calandra, Henri
    Chapman, Barbara M.
    COMPUTING, 2014, 96 (12) : 1149 - 1162
  • [27] Cooperative CPU, GPU, and FPGA heterogeneous execution with EngineCL
    Davila Guzman, Maria Angelica
    Nozal, Raul
    Gran Tejero, Ruben
    Villarroya-Gaudo, Maria
    Suarez Gracia, Dario
    Luis Bosque, Jose
    JOURNAL OF SUPERCOMPUTING, 2019, 75 (03): : 1732 - 1746
  • [28] C to Cellular Automata and Execution on CPU, GPU and FPGA
    Drieseberg, Jens
    Siemers, Christian
    JOURNAL OF CELLULAR AUTOMATA, 2016, 11 (01) : 7 - 20
  • [29] PERFORMANCE COMPARISON OF FPGA, GPU AND CPU IN IMAGE PROCESSING
    Asano, Shuichi
    Maruyama, Tsutomu
    Yamaguchi, Yoshiki
    FPL: 2009 INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS, 2009, : 126 - 131
  • [30] Optimization strategies for CPU and GPU implementations of a smoothed particle hydrodynamics method
    Dominguez, Jose M.
    Crespo, Alejandro J. C.
    Gomez-Gesteira, Moncho
    COMPUTER PHYSICS COMMUNICATIONS, 2013, 184 (03) : 617 - 627