Hardware and Software Co-optimization of Convolutional and Self-attention Combined Model Based on FPGA

被引:0
|
作者
Hu, Wei [1 ,2 ]
Li, Heyuan [1 ,2 ]
Liu, Fang [3 ,4 ]
Zhong, Zhiyv [1 ,2 ]
机构
[1] Wuhan Univ Sci & Technol, Coll Comp Sci, Wuhan, Peoples R China
[2] Hubei Prov Key Lab Intelligent Informat Proc & Re, Wuhan, Peoples R China
[3] Wuhan Univ, Coll Comp Sci, Wuhan, Peoples R China
[4] Wuhan Inst City, Dept Informat Engn, Wuhan, Peoples R China
来源
关键词
Attention Mechanism; Conv; Accelerators; FPGA; Co-optimization;
D O I
10.1007/978-981-97-2387-4_22
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Since Transformer was proposed, the self-attention mechanism has been widely used. Some studies have tried to apply the self-attention mechanism to the field of computer vision CV. However, since self-attention lacks some inductive biases inherent to CNNs, it cannot achieve good generalization in the case of insufficient data. To solve this problem, researchers have proposed to combine the convolution module with the self-attention mechanism module to complement the inductive bias lacking by the self-attention mechanism. Many models based on this idea have been generated with good results. However, traditional central processor architectures cannot take good advantage of the parallel nature of these models. Among various computing platforms, FPGA becomes a suitable solution for algorithm acceleration with its high parallelism. At the same time, we note that the combined modules of convolution and self-attention have not received enough attention in terms of acceleration. Therefore, customizing computational units using FPGAs to improve model parallelism is a feasible solution. In this paper, we optimize the parallelism of the combined model of convolution and self-attention, and design algorithm optimization for two of the most complex generic nonlinear functions from the perspective of hardware-software co-optimization to further reduce the hardware complexity and the latency of the whole system, and design the corresponding hardware modules. The design is coded in HDL, a hardware description language, and simulated on a Xilinx FPGA. The experimental results show that the hardware resource consumption of the ZCU216 FPGA-based design is greatly reduced compared to the conventional design, while the throughput is increased by 8.82x and 1.23x compared to the CPU and GPU, respectively.
引用
收藏
页码:328 / 342
页数:15
相关论文
共 50 条
  • [41] Masked face recognition based on knowledge distillation and convolutional self-attention network
    Wan, Weiguo
    Wen, Runlin
    Yao, Li
    Yang, Yong
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2024, : 2269 - 2284
  • [42] IDTransformer: Infrared image denoising method based on convolutional transposed self-attention
    Shen, Zhengwei
    Qin, Feiwei
    Ge, Ruiquan
    Wang, Changmiao
    Zhang, Kai
    Huang, Jie
    ALEXANDRIA ENGINEERING JOURNAL, 2025, 110 : 310 - 321
  • [43] Evolutionary-Based Co-optimization of DNN and Hardware Configurations on Edge GPU
    Bouzidi, Halima
    Ouarnoughi, Hamza
    Talbi, El-Ghazali
    El Cadi, Abdessamad Ait
    Niar, Smail
    OPTIMIZATION AND LEARNING, OLA 2022, 2022, 1684 : 3 - 12
  • [44] Systematical vibration data recovery based on novel convolutional self-attention networks
    Fan, Gao
    Zhang, Deyun
    Hu, Manman
    Li, Jun
    Hao, Hong
    JOURNAL OF CIVIL STRUCTURAL HEALTH MONITORING, 2025, 15 (02) : 675 - 695
  • [45] Crop leaf disease recognition based on Self-Attention convolutional neural network
    Zeng, Weihui
    Li, Miao
    COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2020, 172
  • [46] Wind power prediction based on improved self-attention mechanism combined with Bi-directional Temporal Convolutional Network
    Shi, Jian
    Teh, Jiashen
    Lai, Ching-Ming
    ENERGY, 2025, 322
  • [47] Robot Path Planning Based on Self-Attention Mechanism Combined with DDPG
    Wang, Fengying
    Chen, Ying
    Yuan, Shuai
    Du, Liming
    Computer Engineering and Applications, 60 (19): : 158 - 166
  • [48] Deep Multiscale Convolutional Model With Multihead Self-Attention for Industrial Process Fault Diagnosis
    Chen, Youqiang
    Zhang, Ridong
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2025,
  • [49] Multi-Objective Hardware-Software Co-Optimization for the SNIPER Multi-Core Simulator
    Chis, Radu
    Vintan, Lucian
    2014 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTER COMMUNICATION AND PROCESSING (ICCP), 2014, : 3 - +
  • [50] Air Quality Prediction Based on Time Series Decomposition and Convolutional Sparse Self-Attention Mechanism Transformer Model
    Cao, Wenyi
    Qi, Weiwei
    Lu, Peiqi
    IEEE ACCESS, 2024, 12 : 155340 - 155350