Hardware and Software Co-optimization of Convolutional and Self-attention Combined Model Based on FPGA

被引：0

作者：

Hu, Wei ^{[1
,2
]}

Li, Heyuan ^{[1
,2
]}

Liu, Fang ^{[3
,4
]}

Zhong, Zhiyv ^{[1
,2
]}

机构：

[1] Wuhan Univ Sci & Technol, Coll Comp Sci, Wuhan, Peoples R China

[2] Hubei Prov Key Lab Intelligent Informat Proc & Re, Wuhan, Peoples R China

[3] Wuhan Univ, Coll Comp Sci, Wuhan, Peoples R China

[4] Wuhan Inst City, Dept Informat Engn, Wuhan, Peoples R China

来源：

WEB AND BIG DATA, PT III, APWEB-WAIM 2023 | 2024年 / 14333卷

关键词：

Attention Mechanism; Conv; Accelerators; FPGA; Co-optimization;

D O I：

10.1007/978-981-97-2387-4_22

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Since Transformer was proposed, the self-attention mechanism has been widely used. Some studies have tried to apply the self-attention mechanism to the field of computer vision CV. However, since self-attention lacks some inductive biases inherent to CNNs, it cannot achieve good generalization in the case of insufficient data. To solve this problem, researchers have proposed to combine the convolution module with the self-attention mechanism module to complement the inductive bias lacking by the self-attention mechanism. Many models based on this idea have been generated with good results. However, traditional central processor architectures cannot take good advantage of the parallel nature of these models. Among various computing platforms, FPGA becomes a suitable solution for algorithm acceleration with its high parallelism. At the same time, we note that the combined modules of convolution and self-attention have not received enough attention in terms of acceleration. Therefore, customizing computational units using FPGAs to improve model parallelism is a feasible solution. In this paper, we optimize the parallelism of the combined model of convolution and self-attention, and design algorithm optimization for two of the most complex generic nonlinear functions from the perspective of hardware-software co-optimization to further reduce the hardware complexity and the latency of the whole system, and design the corresponding hardware modules. The design is coded in HDL, a hardware description language, and simulated on a Xilinx FPGA. The experimental results show that the hardware resource consumption of the ZCU216 FPGA-based design is greatly reduced compared to the conventional design, while the throughput is increased by 8.82x and 1.23x compared to the CPU and GPU, respectively.

引用

页码：328 / 342

页数：15

共 50 条

[41] Masked face recognition based on knowledge distillation and convolutional self-attention network
Wan, Weiguo
Wen, Runlin
Yao, Li
Yang, Yong
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2024, : 2269 - 2284
[42] IDTransformer: Infrared image denoising method based on convolutional transposed self-attention
Shen, Zhengwei
Qin, Feiwei
Ge, Ruiquan
Wang, Changmiao
Zhang, Kai
Huang, Jie
ALEXANDRIA ENGINEERING JOURNAL, 2025, 110 : 310 - 321
[43] Evolutionary-Based Co-optimization of DNN and Hardware Configurations on Edge GPU
Bouzidi, Halima
Ouarnoughi, Hamza
Talbi, El-Ghazali
El Cadi, Abdessamad Ait
Niar, Smail
OPTIMIZATION AND LEARNING, OLA 2022, 2022, 1684 : 3 - 12
[44] Systematical vibration data recovery based on novel convolutional self-attention networks
Fan, Gao
Zhang, Deyun
Hu, Manman
Li, Jun
Hao, Hong
JOURNAL OF CIVIL STRUCTURAL HEALTH MONITORING, 2025, 15 (02) : 675 - 695
[45] Crop leaf disease recognition based on Self-Attention convolutional neural network
Zeng, Weihui
Li, Miao
COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2020, 172
[46] Wind power prediction based on improved self-attention mechanism combined with Bi-directional Temporal Convolutional Network
Shi, Jian
Teh, Jiashen
Lai, Ching-Ming
ENERGY, 2025, 322
[47] Robot Path Planning Based on Self-Attention Mechanism Combined with DDPG
Wang, Fengying
Chen, Ying
Yuan, Shuai
Du, Liming
Computer Engineering and Applications, 60 (19): : 158 - 166
[48] Deep Multiscale Convolutional Model With Multihead Self-Attention for Industrial Process Fault Diagnosis
Chen, Youqiang
Zhang, Ridong
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2025,
[49] Multi-Objective Hardware-Software Co-Optimization for the SNIPER Multi-Core Simulator
Chis, Radu
Vintan, Lucian
2014 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTER COMMUNICATION AND PROCESSING (ICCP), 2014, : 3 - +
[50] Air Quality Prediction Based on Time Series Decomposition and Convolutional Sparse Self-Attention Mechanism Transformer Model
Cao, Wenyi
Qi, Weiwei
Lu, Peiqi
IEEE ACCESS, 2024, 12 : 155340 - 155350

← 1 2 3 4 5 →