Hardware-Efficient Template-Based Deep CNNs Accelerator Design

被引:1
|
作者
Alhussain, Azzam [1 ]
Lin, Mingjie [1 ]
机构
[1] Univ Cent Florida, Coll Engn & Comp Sci, Orlando, FL 32816 USA
关键词
CNN; FPGA; Deep Learning; Accelerator design;
D O I
10.1109/NAS55553.2022.9925552
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Acceleration of Convolutional Neural Network (CNN) on edge devices has recently achieved a remarkable performance in image classification and object detection applications. This paper proposes an efficient and scalable CNN-based SoC-FPGA accelerator design that takes pre-trained weights with a 16-bit fixed-point quantization and target hardware specification to generate an optimized template capable of achieving higher performance versus resource utilization trade-off. The template analyzed the computational workload, data dependency, and external memory bandwidth and utilized loop tiling transformation along with dataflow modeling to convert convolutional and fully connected layers into vector multiplication between input and output feature maps, which resulted in a single compute unit on-chip. Furthermore, the accelerator was examined among AlexNet, VGG16, and LeNet networks and ran at 200-M13z with a peak performance of 230 GOP/s depending on ZYNQ boards and state-space exploration of different compute unit configurations during simulation and synthesis. Lastly, our proposed methodology was benchmarked against the previous development on Ultra96 for higher performance measurement.
引用
收藏
页码:9 / 12
页数:4
相关论文
共 50 条
  • [21] Deep Template-based Object Instance Detection
    Mercier, Jean-Philippe
    Garon, Mathieu
    Giguere, Philippe
    Lalonde, Jean-Francois
    [J]. 2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021), 2021, : 1506 - 1515
  • [22] Computationally Efficient Template-Based Face Recognition
    Wu, Yue
    AdbAlmageed, Wael
    Rawls, Stephen
    Natarajan, Prem
    [J]. 2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 1424 - 1429
  • [23] Deep template-based protein structure prediction
    Wu, Fandi
    Xu, Jinbo
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2021, 17 (05)
  • [24] An Efficient FPGA Accelerator Design for Optimized CNNs Using OpenCL
    Vemparala, Manoj Rohit
    Frickenstein, Alexander
    Stechele, Walter
    [J]. ARCHITECTURE OF COMPUTING SYSTEMS - ARCS 2019, 2019, 11479 : 236 - 249
  • [25] Design and Optimization of Hardware-Efficient Filters for Active Safety Algorithms
    Dlugosz, Rafal Tomasz
    Szulc, Michal
    Kolasa, Marta
    Skruch, Pawel
    Kogut, Krzysztof
    Markiewicz, Pawel
    Orlowski, Mateusz
    Rozewicz, Maciej
    Ryszka, Anna
    Sasin, Dominik
    Talaska, Tomasz
    [J]. SAE INTERNATIONAL JOURNAL OF PASSENGER CARS-ELECTRONIC AND ELECTRICAL SYSTEMS, 2015, 8 (01): : 41 - 50
  • [26] Template-Based Integrated Design Environment for Rocket Design
    Hu, Chunsheng
    Xu, Chengdong
    [J]. ADVANCED SCIENCE LETTERS, 2011, 4 (8-10) : 3187 - 3192
  • [27] Hardware-Efficient Barrel Shifter Design Using Customized Dynamic Logic Based MUX
    Chon, Dain
    Yang, Yoojeong
    Choi, Hayoung
    Choi, Woong
    [J]. 2022 19TH INTERNATIONAL SOC DESIGN CONFERENCE (ISOCC), 2022, : 59 - 60
  • [28] Design and implementation of hardware-efficient architecture for saturation-based image dehazing algorithm
    Anuja George
    E. P. Jayakumar
    [J]. Journal of Real-Time Image Processing, 2023, 20
  • [29] Design of Scalable Hardware-Efficient Compressive Sensing Image Sensors
    Leitner, Stefan
    Wang, Haibo
    Tragoudas, Spyros
    [J]. IEEE SENSORS JOURNAL, 2018, 18 (02) : 641 - 651
  • [30] Algorithm and Architecture Design of a Hardware-Efficient Image Dehazing Engine
    Lee, Yu-Hsuan
    Wu, Bo-Hua
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2019, 29 (07) : 2146 - 2161