FPGA-Based Unified Accelerator for Convolutional Neural Network and Vision Transformer

被引:0
|
作者
Li T. [1 ]
Zhang F. [1 ]
Wang S. [2 ]
Cao W. [3 ]
Chen L. [1 ]
机构
[1] Institute of Information Technology, Information Engineering University, Zhengzhou
[2] The 95072, Unit of PLA Air Force, Nanning
[3] Institute for Big Data, Fudan University, Shanghai
关键词
Computer vision; Convolutional Neural Network (CNN); Field Programmable Gate Array (FPGA); Hardware accelerator; Transformer;
D O I
10.11999/JEIT230713
中图分类号
学科分类号
摘要
Considering the problem that traditional Field Programmable Gate Array (FPGA)-based Convolutional Neural Network(CNN) accelerators in computer vision are not adapted to Vision Transformer networks, a unified FPGA accelerator for convolutional neural networks and Transformer is proposed. First, a generalized computation mapping method for FPGA is proposed based on the characteristics of convolution and attention mechanisms. Second, a nonlinear and normalized acceleration unit is proposed to provide acceleration support for multiple nonlinear operations in computer vision networks. Then, we implemented the accelerator design on Xilinx XCVU37P FPGA. Experimental results show that the proposed nonlinear acceleration unit improves the throughput while causing only a small accuracy loss. ResNet-50 and ViT-B/16 achieved 589.94 GOPS and 564.76 GOPS performance on the proposed FPGA accelerator. Compared to the GPU implementation, energy efficiency is improved by a factor of 5.19 and 7.17, respectively. Compared with other large FPGA-based designs, the energy efficiency is significantly improved, and the computing efficiency is increased by 8.02%~177.53% compared to other FPGA accelerators. © 2024 Science Press. All rights reserved.
引用
收藏
页码:2663 / 2672
页数:9
相关论文
共 24 条
  • [1] V e r y d e e p convolutional networks for large-scale image recognition, 3rd International Conference on Learning Representations, (2015)
  • [2] HE Kaiming, ZHANG Xiangyu, REN Shaoqing, Et al., Deep residual learning for image recognition[C], 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp. 770-778, (2016)
  • [3] SZEGEDY C, LIU Wei, JIA Yangqing, Et al., Going deeper with convolutions[C], 2015 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1-9, (2015)
  • [4] VASWANI A, SHAZEER N, PARMAR N, Et al., Attention is all you need[C], The 31st International Conference on Neural Information Processing Systems, pp. 6000-6010, (2017)
  • [5] CARION N, MASSA F, SYNNAEVE G, Et al., End-to-end object detection with transformers[C], The 16th European Conference on Computer Vision, pp. 213-229, (2020)
  • [6] CHEN Ying, KUANG Cheng, Pedestrian re-identification based on CNN and Transformer multi-scale learning[J], Journal of Electronics & Information Technology, 45, 6, pp. 2256-2263, (2023)
  • [7] ZHAI Xiaohua, KOLESNIKOV A, HOULSBY N, Et al., Scaling vision transformers[C], 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1204-1213, (2022)
  • [8] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, Et al., An image is worth 16x16 words: transformers for image recognition at scale, 9th International Conference on Learning Representations, (2021)
  • [9] WANG Teng, GONG Lei, WANG Chao, Et al., ViA: A novel vision-transformer accelerator based on FPGA[J], IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 41, 11, pp. 4088-4099, (2022)
  • [10] NAG S, DATTA G, KUNDU S, Et al., ViTA: A vision transformer inference accelerator for edge applications[C], 2023 IEEE International Symposium on Circuits and Systems, pp. 1-5, (2023)