Hardware and Software Co-optimization of Convolutional and Self-attention Combined Model Based on FPGA

被引：0

作者：

Hu, Wei ^{[1
,2
]}

Li, Heyuan ^{[1
,2
]}

Liu, Fang ^{[3
,4
]}

Zhong, Zhiyv ^{[1
,2
]}

机构：

[1] Wuhan Univ Sci & Technol, Coll Comp Sci, Wuhan, Peoples R China

[2] Hubei Prov Key Lab Intelligent Informat Proc & Re, Wuhan, Peoples R China

[3] Wuhan Univ, Coll Comp Sci, Wuhan, Peoples R China

[4] Wuhan Inst City, Dept Informat Engn, Wuhan, Peoples R China

来源：

WEB AND BIG DATA, PT III, APWEB-WAIM 2023 | 2024年 / 14333卷

关键词：

Attention Mechanism; Conv; Accelerators; FPGA; Co-optimization;

D O I：

10.1007/978-981-97-2387-4_22

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Since Transformer was proposed, the self-attention mechanism has been widely used. Some studies have tried to apply the self-attention mechanism to the field of computer vision CV. However, since self-attention lacks some inductive biases inherent to CNNs, it cannot achieve good generalization in the case of insufficient data. To solve this problem, researchers have proposed to combine the convolution module with the self-attention mechanism module to complement the inductive bias lacking by the self-attention mechanism. Many models based on this idea have been generated with good results. However, traditional central processor architectures cannot take good advantage of the parallel nature of these models. Among various computing platforms, FPGA becomes a suitable solution for algorithm acceleration with its high parallelism. At the same time, we note that the combined modules of convolution and self-attention have not received enough attention in terms of acceleration. Therefore, customizing computational units using FPGAs to improve model parallelism is a feasible solution. In this paper, we optimize the parallelism of the combined model of convolution and self-attention, and design algorithm optimization for two of the most complex generic nonlinear functions from the perspective of hardware-software co-optimization to further reduce the hardware complexity and the latency of the whole system, and design the corresponding hardware modules. The design is coded in HDL, a hardware description language, and simulated on a Xilinx FPGA. The experimental results show that the hardware resource consumption of the ZCU216 FPGA-based design is greatly reduced compared to the conventional design, while the throughput is increased by 8.82x and 1.23x compared to the CPU and GPU, respectively.

引用

页码：328 / 342

页数：15

共 50 条

[21] OPTIMIZATION OF CONVOLUTIONAL NEURAL NETWORK HARDWARE STRUCTURE BASED ON FPGA
Zhu, Min
Kuang, Qiqi
Yang, Chunling
Lin, Jianjun
PROCEEDINGS OF THE 2018 13TH IEEE CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA 2018), 2018, : 1797 - 1802
[22] Image Classification based on Self-attention Convolutional Neural Network
Cai, Xiaohong
Li, Ming
Cao, Hui
Ma, Jingang
Wang, Xiaoyan
Zhuang, Xuqiang
SIXTH INTERNATIONAL WORKSHOP ON PATTERN RECOGNITION, 2021, 11913
[23] Bearing Fault Detection Based on Convolutional Self-Attention Mechanism
Ye, Ruida
Wang, Weijie
Ren, Yuan
Zhang, Keming
PROCEEDINGS OF 2020 IEEE 2ND INTERNATIONAL CONFERENCE ON CIVIL AVIATION SAFETY AND INFORMATION TECHNOLOGY (ICCASIT), 2020, : 869 - 873
[24] Visualization-Based Software Defect Prediction via Convolutional Neural Network with Global Self-Attention
Qiu, Shaojian
Wang, Shaosheng
Tian, Xuhong
Huang, Mengyang
Huang, Qiong
2022 IEEE 22ND INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY, QRS, 2022, : 189 - 198
[25] Hardware and Software Co-optimization for the Initialization Failure of the ReRAM-based Cross-bar Array
Kim, Youngseok
Kim, Seyoung
Yeh, Chun-Chen
Narayanan, Vijay
Choi, Jungwook
ACM JOURNAL ON EMERGING TECHNOLOGIES IN COMPUTING SYSTEMS, 2020, 16 (04)
[26] AICAS Grand Challenge 2024: Software and Hardware Co-optimization for General Large Language Model Inference on CPU
Tan, Junfeng
Yu, Guosheng
Li, Jianing
Ma, Xiaohan
Bao, Fang
Pan, Evens
Bian, David
Li, Yongfu
Du, Yuan
Du, Li
Li, Bo
Mao, Wei
2024 IEEE 6TH INTERNATIONAL CONFERENCE ON AI CIRCUITS AND SYSTEMS, AICAS 2024, 2024,
[27] Double Attention: An Optimization Method for the Self-Attention Mechanism Based on Human Attention
Zhang, Zeyu
Li, Bin
Yan, Chenyang
Furuichi, Kengo
Todo, Yuki
BIOMIMETICS, 2025, 10 (01)
[28] Accelerating Framework of Transformer by Hardware Design and Model Compression Co-Optimization
Qi, Panjie
Sha, Edwin Hsing-Mean
Zhuge, Qingfeng
Peng, Hongwu
Huang, Shaoyi
Kong, Zhenglun
Song, Yuhong
Li, Bingbing
2021 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED DESIGN (ICCAD), 2021,
[29] Aspect Based Sentiment Analysis with Self-Attention and Gated Convolutional Networks
Yang, Jian
Yang, Juan
PROCEEDINGS OF 2020 IEEE 11TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND SERVICE SCIENCE (ICSESS 2020), 2020, : 146 - 149
[30] Compound Convolutional and Self-Attention Network for Session-Based Recommendation
Xiao, Yan
Huo, Lin
Computer Engineering and Applications, 2023, 59 (10) : 104 - 113

← 1 2 3 4 5 →