Hardware and Software Co-optimization of Convolutional and Self-attention Combined Model Based on FPGA

被引:0
|
作者
Hu, Wei [1 ,2 ]
Li, Heyuan [1 ,2 ]
Liu, Fang [3 ,4 ]
Zhong, Zhiyv [1 ,2 ]
机构
[1] Wuhan Univ Sci & Technol, Coll Comp Sci, Wuhan, Peoples R China
[2] Hubei Prov Key Lab Intelligent Informat Proc & Re, Wuhan, Peoples R China
[3] Wuhan Univ, Coll Comp Sci, Wuhan, Peoples R China
[4] Wuhan Inst City, Dept Informat Engn, Wuhan, Peoples R China
来源
关键词
Attention Mechanism; Conv; Accelerators; FPGA; Co-optimization;
D O I
10.1007/978-981-97-2387-4_22
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Since Transformer was proposed, the self-attention mechanism has been widely used. Some studies have tried to apply the self-attention mechanism to the field of computer vision CV. However, since self-attention lacks some inductive biases inherent to CNNs, it cannot achieve good generalization in the case of insufficient data. To solve this problem, researchers have proposed to combine the convolution module with the self-attention mechanism module to complement the inductive bias lacking by the self-attention mechanism. Many models based on this idea have been generated with good results. However, traditional central processor architectures cannot take good advantage of the parallel nature of these models. Among various computing platforms, FPGA becomes a suitable solution for algorithm acceleration with its high parallelism. At the same time, we note that the combined modules of convolution and self-attention have not received enough attention in terms of acceleration. Therefore, customizing computational units using FPGAs to improve model parallelism is a feasible solution. In this paper, we optimize the parallelism of the combined model of convolution and self-attention, and design algorithm optimization for two of the most complex generic nonlinear functions from the perspective of hardware-software co-optimization to further reduce the hardware complexity and the latency of the whole system, and design the corresponding hardware modules. The design is coded in HDL, a hardware description language, and simulated on a Xilinx FPGA. The experimental results show that the hardware resource consumption of the ZCU216 FPGA-based design is greatly reduced compared to the conventional design, while the throughput is increased by 8.82x and 1.23x compared to the CPU and GPU, respectively.
引用
收藏
页码:328 / 342
页数:15
相关论文
共 50 条
  • [1] HSCONN: Hardware-Software Co-Optimization of Self-Attention Neural Networks for Large Language Models
    Liu, Siqin
    Kuve, Prakash Chand
    Karanth, Avinash
    PROCEEDING OF THE GREAT LAKES SYMPOSIUM ON VLSI 2024, GLSVLSI 2024, 2024, : 736 - 741
  • [2] FPGAN: An FPGA Accelerator for Graph Attention Networks With Software and Hardware Co-Optimization
    Yan, Weian
    Tong, Weiqin
    Zhi, Xiaoli
    IEEE ACCESS, 2020, 8 : 171608 - 171620
  • [3] Hardware and Software Co-optimization for Windows Attention
    Hu, Wei
    Hu, Kejie
    Liu, Fang
    Fan, Jie
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, KSEM 2022, PT III, 2022, 13370 : 656 - 668
  • [4] Attar: RRAM-based in-memory attention accelerator with software-hardware co-optimization
    Bing LI
    Ying QI
    Ying WANG
    Yinhe HAN
    Science China(Information Sciences), 2025, 68 (03) : 371 - 387
  • [5] Attar: RRAM-based in-memory attention accelerator with software-hardware co-optimization
    Li, Bing
    Qi, Ying
    Wang, Ying
    Han, Yinhe
    SCIENCE CHINA-INFORMATION SCIENCES, 2025, 68 (03)
  • [6] Software-Hardware Co-Optimization for CNNs Based on Reconfigurable Devices
    Liu, Fang
    Fan, Zimeng
    He, Yanxiang
    Peng, Min
    19TH IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2021), 2021, : 1279 - 1286
  • [7] A Co-Optimization of Software and Hardware for PCIe-Based Small Packet DMA Transfer
    Gu, Xiaotian
    Shao, Lisong
    Bai, Ningfeng
    Zhang, Guosheng
    Zhang, Xinyi
    IEEE EMBEDDED SYSTEMS LETTERS, 2025, 17 (01) : 6 - 9
  • [8] A genetic algorithm based approach for multi-objective hardware/software co-optimization
    Banerjee, Tania
    Gadou, Mohamed
    Ranka, Sanjay
    SUSTAINABLE COMPUTING-INFORMATICS & SYSTEMS, 2016, 10 : 36 - 47
  • [9] Invited: Hardware/Software Co-Synthesis and Co-Optimization for Autonomous Systems
    Chang, Wanli
    Zhao, Shuai
    Burton, Simon
    Wang, Haitong
    Chen, Ting
    Chen, Nan
    Audsley, Neil
    2021 58TH ACM/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2021, : 1319 - 1322
  • [10] Unified Hardware Software Co-Optimization for Robust Neural Network Acceleration
    Rashidi, Bahador
    Gao, Chao
    Lu, Shan
    Wang, Zhisheng
    Zhou, Chunhua
    Di Niu
    Sun, Fengyu
    56TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, MICRO 2023, 2023, : 77 - 90