Hardware and Software Co-optimization of Convolutional and Self-attention Combined Model Based on FPGA

被引:0
|
作者
Hu, Wei [1 ,2 ]
Li, Heyuan [1 ,2 ]
Liu, Fang [3 ,4 ]
Zhong, Zhiyv [1 ,2 ]
机构
[1] Wuhan Univ Sci & Technol, Coll Comp Sci, Wuhan, Peoples R China
[2] Hubei Prov Key Lab Intelligent Informat Proc & Re, Wuhan, Peoples R China
[3] Wuhan Univ, Coll Comp Sci, Wuhan, Peoples R China
[4] Wuhan Inst City, Dept Informat Engn, Wuhan, Peoples R China
来源
关键词
Attention Mechanism; Conv; Accelerators; FPGA; Co-optimization;
D O I
10.1007/978-981-97-2387-4_22
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Since Transformer was proposed, the self-attention mechanism has been widely used. Some studies have tried to apply the self-attention mechanism to the field of computer vision CV. However, since self-attention lacks some inductive biases inherent to CNNs, it cannot achieve good generalization in the case of insufficient data. To solve this problem, researchers have proposed to combine the convolution module with the self-attention mechanism module to complement the inductive bias lacking by the self-attention mechanism. Many models based on this idea have been generated with good results. However, traditional central processor architectures cannot take good advantage of the parallel nature of these models. Among various computing platforms, FPGA becomes a suitable solution for algorithm acceleration with its high parallelism. At the same time, we note that the combined modules of convolution and self-attention have not received enough attention in terms of acceleration. Therefore, customizing computational units using FPGAs to improve model parallelism is a feasible solution. In this paper, we optimize the parallelism of the combined model of convolution and self-attention, and design algorithm optimization for two of the most complex generic nonlinear functions from the perspective of hardware-software co-optimization to further reduce the hardware complexity and the latency of the whole system, and design the corresponding hardware modules. The design is coded in HDL, a hardware description language, and simulated on a Xilinx FPGA. The experimental results show that the hardware resource consumption of the ZCU216 FPGA-based design is greatly reduced compared to the conventional design, while the throughput is increased by 8.82x and 1.23x compared to the CPU and GPU, respectively.
引用
收藏
页码:328 / 342
页数:15
相关论文
共 50 条
  • [21] OPTIMIZATION OF CONVOLUTIONAL NEURAL NETWORK HARDWARE STRUCTURE BASED ON FPGA
    Zhu, Min
    Kuang, Qiqi
    Yang, Chunling
    Lin, Jianjun
    PROCEEDINGS OF THE 2018 13TH IEEE CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA 2018), 2018, : 1797 - 1802
  • [22] Image Classification based on Self-attention Convolutional Neural Network
    Cai, Xiaohong
    Li, Ming
    Cao, Hui
    Ma, Jingang
    Wang, Xiaoyan
    Zhuang, Xuqiang
    SIXTH INTERNATIONAL WORKSHOP ON PATTERN RECOGNITION, 2021, 11913
  • [23] Bearing Fault Detection Based on Convolutional Self-Attention Mechanism
    Ye, Ruida
    Wang, Weijie
    Ren, Yuan
    Zhang, Keming
    PROCEEDINGS OF 2020 IEEE 2ND INTERNATIONAL CONFERENCE ON CIVIL AVIATION SAFETY AND INFORMATION TECHNOLOGY (ICCASIT), 2020, : 869 - 873
  • [24] Visualization-Based Software Defect Prediction via Convolutional Neural Network with Global Self-Attention
    Qiu, Shaojian
    Wang, Shaosheng
    Tian, Xuhong
    Huang, Mengyang
    Huang, Qiong
    2022 IEEE 22ND INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY, QRS, 2022, : 189 - 198
  • [25] Hardware and Software Co-optimization for the Initialization Failure of the ReRAM-based Cross-bar Array
    Kim, Youngseok
    Kim, Seyoung
    Yeh, Chun-Chen
    Narayanan, Vijay
    Choi, Jungwook
    ACM JOURNAL ON EMERGING TECHNOLOGIES IN COMPUTING SYSTEMS, 2020, 16 (04)
  • [26] AICAS Grand Challenge 2024: Software and Hardware Co-optimization for General Large Language Model Inference on CPU
    Tan, Junfeng
    Yu, Guosheng
    Li, Jianing
    Ma, Xiaohan
    Bao, Fang
    Pan, Evens
    Bian, David
    Li, Yongfu
    Du, Yuan
    Du, Li
    Li, Bo
    Mao, Wei
    2024 IEEE 6TH INTERNATIONAL CONFERENCE ON AI CIRCUITS AND SYSTEMS, AICAS 2024, 2024,
  • [27] Double Attention: An Optimization Method for the Self-Attention Mechanism Based on Human Attention
    Zhang, Zeyu
    Li, Bin
    Yan, Chenyang
    Furuichi, Kengo
    Todo, Yuki
    BIOMIMETICS, 2025, 10 (01)
  • [28] Accelerating Framework of Transformer by Hardware Design and Model Compression Co-Optimization
    Qi, Panjie
    Sha, Edwin Hsing-Mean
    Zhuge, Qingfeng
    Peng, Hongwu
    Huang, Shaoyi
    Kong, Zhenglun
    Song, Yuhong
    Li, Bingbing
    2021 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED DESIGN (ICCAD), 2021,
  • [29] Aspect Based Sentiment Analysis with Self-Attention and Gated Convolutional Networks
    Yang, Jian
    Yang, Juan
    PROCEEDINGS OF 2020 IEEE 11TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND SERVICE SCIENCE (ICSESS 2020), 2020, : 146 - 149
  • [30] Compound Convolutional and Self-Attention Network for Session-Based Recommendation
    Xiao, Yan
    Huo, Lin
    Computer Engineering and Applications, 2023, 59 (10) : 104 - 113