Hardware and Software Co-optimization of Convolutional and Self-attention Combined Model Based on FPGA

被引:0
|
作者
Hu, Wei [1 ,2 ]
Li, Heyuan [1 ,2 ]
Liu, Fang [3 ,4 ]
Zhong, Zhiyv [1 ,2 ]
机构
[1] Wuhan Univ Sci & Technol, Coll Comp Sci, Wuhan, Peoples R China
[2] Hubei Prov Key Lab Intelligent Informat Proc & Re, Wuhan, Peoples R China
[3] Wuhan Univ, Coll Comp Sci, Wuhan, Peoples R China
[4] Wuhan Inst City, Dept Informat Engn, Wuhan, Peoples R China
来源
关键词
Attention Mechanism; Conv; Accelerators; FPGA; Co-optimization;
D O I
10.1007/978-981-97-2387-4_22
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Since Transformer was proposed, the self-attention mechanism has been widely used. Some studies have tried to apply the self-attention mechanism to the field of computer vision CV. However, since self-attention lacks some inductive biases inherent to CNNs, it cannot achieve good generalization in the case of insufficient data. To solve this problem, researchers have proposed to combine the convolution module with the self-attention mechanism module to complement the inductive bias lacking by the self-attention mechanism. Many models based on this idea have been generated with good results. However, traditional central processor architectures cannot take good advantage of the parallel nature of these models. Among various computing platforms, FPGA becomes a suitable solution for algorithm acceleration with its high parallelism. At the same time, we note that the combined modules of convolution and self-attention have not received enough attention in terms of acceleration. Therefore, customizing computational units using FPGAs to improve model parallelism is a feasible solution. In this paper, we optimize the parallelism of the combined model of convolution and self-attention, and design algorithm optimization for two of the most complex generic nonlinear functions from the perspective of hardware-software co-optimization to further reduce the hardware complexity and the latency of the whole system, and design the corresponding hardware modules. The design is coded in HDL, a hardware description language, and simulated on a Xilinx FPGA. The experimental results show that the hardware resource consumption of the ZCU216 FPGA-based design is greatly reduced compared to the conventional design, while the throughput is increased by 8.82x and 1.23x compared to the CPU and GPU, respectively.
引用
收藏
页码:328 / 342
页数:15
相关论文
共 50 条
  • [31] High Performance CNN Accelerators Based on Hardware and Algorithm Co-Optimization
    Yuan, Tian
    Liu, Weiqiang
    Han, Jie
    Lombardi, Fabrizio
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2021, 68 (01) : 250 - 263
  • [32] FPGA-Based Software Profiler for Hardware/Software Co-design
    Saad, El-Sayed M.
    Awadalla, Medhat H. A.
    El-Deen, Kareem Ezz
    NRSC: 2009 NATIONAL RADIO SCIENCE CONFERENCE: NRSC 2009, VOLS 1 AND 2, 2009, : 475 - 482
  • [33] Software profiler for fpga-based hardware/software co-design
    Department of Communication, Electronics and Computers, Faculty of Engineering, University of Helwan, Egypt
    不详
    J Eng Appl Sci, 2009, 1 (59-76):
  • [34] Reduced order model using convolutional auto-encoder with self-attention
    Wu, Pin
    Gong, Siquan
    Pan, Kaikai
    Qiu, Feng
    Feng, Weibing
    Pain, Christopher
    PHYSICS OF FLUIDS, 2021, 33 (07)
  • [35] A Deep Dilated Convolutional Self-attention Model for Multimodal Human Activity Recognition
    Wang, Shengzhi
    Xiao, Shuo
    Wang, Yu
    Jiang, Haifeng
    Zhang, Guopeng
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 791 - 797
  • [36] ViT Hybrid Channel Fit Pruning Algorithm for Co-optimization of Hardware and Software for Edge Device
    Liu, Fang
    Li, Heyuan
    Chen, Ziyu
    Hu, Wei
    He, Yanxiang
    Wang, Fei
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT II, KSEM 2024, 2024, 14885 : 333 - 345
  • [37] A Self-attention Network Based Node Embedding Model
    Nguyen, Dai Quoc
    Nguyen, Tu Dinh
    Phung, Dinh
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2020, PT III, 2021, 12459 : 364 - 377
  • [38] Software/Hardware Co-Design Optimization for Sparse Convolutional Neural Networks
    Hu, Wei
    Dong, Yong
    Liu, Fang
    Jiao, Qiang
    2021 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2021, : 2069 - 2074
  • [39] Software-Hardware Co-Optimization on Partial-Sum Problem for PIM-based Neural Network Accelerator
    Wu, Qizhe
    Tao, Linfeng
    Liang, Huawen
    Yuan, Wei
    Tian, Teng
    Xue, Shuang
    Jin, Xi
    2021 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2021,
  • [40] AttenEpilepsy: A 2D convolutional network model based on multi-head self-attention
    Ma, Shuang
    Wang, Haifeng
    Yu, Zhihao
    Du, Luyao
    Zhang, Ming
    Fu, Qingxi
    ENGINEERING ANALYSIS WITH BOUNDARY ELEMENTS, 2024, 169