An Efficient FPGA Accelerator Design for Optimized CNNs Using OpenCL

被引:3
|
作者
Vemparala, Manoj Rohit [1 ]
Frickenstein, Alexander [1 ]
Stechele, Walter [2 ]
机构
[1] BMW Grp, Munich, Germany
[2] Tech Univ Munich, Munich, Germany
关键词
FPGA; CNN; Winograd transform; HLS; Quantization;
D O I
10.1007/978-3-030-18656-2_18
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Convolutional Neural Networks (CNNs) require highly parallel Hardware (HW) accelerators in the form of Graphical Processing Units (GPUs), Application Specific Integrated Circuits (ASICs) or Field Programmable Gate Arrays (FPGAs) to build low latency solutions necessary for implementing image processing applications. FPGAs have the ability to provide a right balance between flexibility, performance and energy efficiency. The design of FPGA based accelerator design traditionally required a tedious Register Transfer Level (RTL) design flow process. To improve design productivity, the proposed work uses High-Level Synthesis (HLS), described in OpenCL, to generate the FPGA bitstream for the CNN model. The 2D Winograd transformation is integrated in the pipeline to reduce the overall number of Multiply and Accumulate (MAC) operations in the CNN. Instead of increasing the batch size to improve the throughput, this work discusses a mixed precision approach which can counter the limited memory bandwidth issue within the CNN. The obtained results are competitive against other FPGA based implementations proposed in literature. The proposed accelerator can achieve more than 1.9x higher energy efficiency compared to an embedded Nvidia Jetson TX1 implementation of VGG-16.
引用
收藏
页码:236 / 249
页数:14
相关论文
共 50 条
  • [1] Energy -Efficient CNNs Accelerator Implementation on FPGA with Optimized Storage and Dataflow
    Zhang, Yonghua
    Jiang, Hongxu
    Li, Xiaobin
    Miao, Rui
    Nie, Jinyan
    Du, Yu
    [J]. 19TH IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2021), 2021, : 1209 - 1214
  • [2] An Efficient Sparse CNNs Accelerator on FPGA
    Zhang, Yonghua
    Jiang, Hongxu
    Li, Xiaobin
    Wang, Haojie
    Dong, Dong
    Cao, Yongxiang
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER 2022), 2022, : 504 - 505
  • [3] An FPGA accelerator for molecular dynamics simulation using OpenCL
    Waidyasooriya H.M.
    Hariyama M.
    Kasahara K.
    [J]. International Journal of Networked and Distributed Computing, 2017, 5 (1) : 52 - 61
  • [4] OctCNN: An Energy-Efficient FPGA Accelerator for CNNs using Octave Convolution Algorithm
    Lou, Wenqi
    Wang, Chao
    Gong, Lei
    Zhou, Xuehai
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER 2020), 2020, : 410 - 411
  • [5] A Memory-Bandwidth-Efficient Word2vec Accelerator Using OpenCL for FPGA
    Shoji, Tomoki
    Waidyasooriya, Hasitha Muthumala
    Ono, Taisuke
    Hariyama, Masanori
    Aoki, Yuichiro
    Kondoh, Yuki
    Nakagawa, Yaoko
    [J]. 2019 SEVENTH INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING WORKSHOPS (CANDARW 2019), 2019, : 103 - 108
  • [6] Architecture of an FPGA Accelerator for Molecular Dynamics Simulation Using OpenCL
    Muthumala, Hasitha
    Waidyasooriya
    Hariyama, Masanori
    Kasahara, Kota
    [J]. 2016 IEEE/ACIS 15TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE (ICIS), 2016, : 115 - 119
  • [7] OpenCL-based design of an FPGA accelerator for quantum annealing simulation
    Waidyasooriya, Hasitha Muthumala
    Hariyama, Masanori
    Miyama, Masamichi J.
    Ohzeki, Masayuki
    [J]. JOURNAL OF SUPERCOMPUTING, 2019, 75 (08): : 5019 - 5039
  • [8] OpenCL-based design of an FPGA accelerator for quantum annealing simulation
    Hasitha Muthumala Waidyasooriya
    Masanori Hariyama
    Masamichi J. Miyama
    Masayuki Ohzeki
    [J]. The Journal of Supercomputing, 2019, 75 : 5019 - 5039
  • [9] Flexible FPGA design for FDTD using OpenCL
    Kenter, Tobias
    Foerstner, Jens
    Plessl, Christian
    [J]. 2017 27TH INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS (FPL), 2017,
  • [10] A Flexible and Efficient FPGA Accelerator for Various Large-Scale and Lightweight CNNs
    Wu, Xiao
    Ma, Yufei
    Wang, Meiqi
    Wang, Zhongfeng
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2022, 69 (03) : 1185 - 1198