An FPGA-Based Transformer Accelerator Using Output Block Stationary Dataflow for Object Recognition Applications

被引:6
|
作者
Zhao, Zhongyu [1 ,2 ]
Cao, Rujian [1 ,2 ]
Un, Ka-Fai [1 ,2 ]
Yu, Wei-Han [1 ,2 ]
Mak, Pui-In [1 ,2 ]
Martins, Rui P. [1 ,2 ,3 ]
机构
[1] Univ Macau, State Key Lab Analog & Mixed Signal VLSI, Inst Microelect, Macau, Peoples R China
[2] Univ Macau, Fac Sci & Technol ECE, Macau, Peoples R China
[3] Univ Lisbon, Inst Super Tecn, P-1649004 Lisbon, Portugal
关键词
Transformers; Energy efficiency; Broadcasting; Convolutional neural networks; Integrated circuit modeling; Field programmable gate arrays; Random access memory; Dataflow; digital accelerator; energy-efficient; field-programmable gate array (FPGA); energy efficiency; image recognition; transformer; CNN ACCELERATOR; EFFICIENT; HARDWARE;
D O I
10.1109/TCSII.2022.3196055
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The transformer-based model has great potential to deliver higher accuracy for object recognition applications when comparing it with the convolution neural network (CNN). Yet, the amount of weight sharing of a transformer-based model is significantly lower than that of the CNN, which should apply different dataflow to reduce the memory access. This brief proposes a transformer accelerator with an output block stationary (OBS) dataflow to minimize the repeated memory access by block-level and vector-level broadcasting while preserving a high digital signal processor (DSP) utilization rate, leading to higher energy efficiency. It also lowers the memory access bandwidth to the input and output. Verified through an FPGA, the proposed accelerator evaluates a transformer-in-transformer (TNT) model with a throughput of 728.3 GOPs, corresponding to energy efficiency of 58.31 GOPs/W.
引用
收藏
页码:281 / 285
页数:5
相关论文
共 50 条
  • [31] High-Performance FPGA-Based CNN Accelerator With Block-Floating-Point Arithmetic
    Lian, Xiaocong
    Liu, Zhenyu
    Song, Zhourui
    Dai, Jiwu
    Zhou, Wei
    Ji, Xiangyang
    [J]. IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2019, 27 (08) : 1874 - 1885
  • [32] Modulation recognition using an FPGA-based convolutional neural network
    Liu, Xueyuan
    Shang, Jing
    Leong, Philip H. W.
    Liu, Cheng
    [J]. 2019 22ND INTERNATIONAL CONFERENCE ON ELECTRICAL MACHINES AND SYSTEMS (ICEMS 2019), 2019, : 3165 - 3170
  • [33] Optimizing FPGA-Based CNN Accelerator Using Differentiable Neural Architecture Search
    Fan, Hongxiang
    Ferianc, Martin
    Liu, Shuanglong
    Que, Zhiqiang
    Niu, Xinyu
    Luk, Wayne
    [J]. 2020 IEEE 38TH INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD 2020), 2020, : 465 - 468
  • [34] A High Performance FPGA-based Accelerator Design for End-to-End Speaker Recognition System
    Jiao, Mingjun
    Li, Yue
    Dang, Pengbo
    Cao, Wei
    Wang, Lingli
    [J]. 2019 INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY (ICFPT 2019), 2019, : 215 - 223
  • [35] An FPGA-Based Energy-Efficient Reconfigurable Depthwise Separable Convolution Accelerator for Image Recognition
    Xuan, Lei
    Un, Ka-Fai
    Lam, Chi-Seng
    Martins, Rui P.
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2022, 69 (10) : 4003 - 4007
  • [36] Using FPGA-based Platforms for Embedded Control Applications in Mechatronics
    Patel, Pourash
    Moallem, Mehrdad
    [J]. 2010 IEEE/ASME INTERNATIONAL CONFERENCE ON ADVANCED INTELLIGENT MECHATRONICS (AIM), 2010,
  • [37] Optimizing FPGA-based Streaming Applications for Throughput Using Pipelining
    Asghar, Ali
    van Loo, Rick
    Kruiper, Timon
    Ziener, Daniel
    [J]. 2019 INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY (ICFPT 2019), 2019, : 351 - 354
  • [38] An FPGA-Based High-Throughput Keypoint Detection Accelerator Using Convolutional Neural Network for Mobile Robot Applications
    Li, Jingyuan
    Liu, Ye
    Huang, Kun
    Zhou, Liang
    Chang, Liang
    Zhou, Jun
    [J]. 2022 IEEE ASIA PACIFIC CONFERENCE ON POSTGRADUATE RESEARCH IN MICROELECTRONICS AND ELECTRONICS, PRIMEASIA, 2022, : 81 - 84
  • [39] An FPGA-Based YOLOv5 Accelerator for Real-Time Industrial Vision Applications
    Yan, Zhihong
    Zhang, Bingqian
    Wang, Dong
    [J]. Micromachines, 2024, 15 (09)
  • [40] FPGA-based Chaotic Cryptosystem by Using Voice Recognition as Access Key
    Rodriguez-Orozco, Eduardo
    Efren Garcia-Guerrero, Enrique
    Inzunza-Gonzalez, Everardo
    Roberto Lopez-Bonilla, Oscar
    Flores-Vergara, Abraham
    Ricardo Cardenas-Valdez, Jose
    Tlelo-Cuautle, Esteban
    [J]. ELECTRONICS, 2018, 7 (12)