Hardware and Software Co-optimization of Convolutional and Self-attention Combined Model Based on FPGA

被引：0

作者：

Hu, Wei ^{[1
,2
]}

Li, Heyuan ^{[1
,2
]}

Liu, Fang ^{[3
,4
]}

Zhong, Zhiyv ^{[1
,2
]}

机构：

[1] Wuhan Univ Sci & Technol, Coll Comp Sci, Wuhan, Peoples R China

[2] Hubei Prov Key Lab Intelligent Informat Proc & Re, Wuhan, Peoples R China

[3] Wuhan Univ, Coll Comp Sci, Wuhan, Peoples R China

[4] Wuhan Inst City, Dept Informat Engn, Wuhan, Peoples R China

来源：

WEB AND BIG DATA, PT III, APWEB-WAIM 2023 | 2024年 / 14333卷

关键词：

Attention Mechanism; Conv; Accelerators; FPGA; Co-optimization;

D O I：

10.1007/978-981-97-2387-4_22

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Since Transformer was proposed, the self-attention mechanism has been widely used. Some studies have tried to apply the self-attention mechanism to the field of computer vision CV. However, since self-attention lacks some inductive biases inherent to CNNs, it cannot achieve good generalization in the case of insufficient data. To solve this problem, researchers have proposed to combine the convolution module with the self-attention mechanism module to complement the inductive bias lacking by the self-attention mechanism. Many models based on this idea have been generated with good results. However, traditional central processor architectures cannot take good advantage of the parallel nature of these models. Among various computing platforms, FPGA becomes a suitable solution for algorithm acceleration with its high parallelism. At the same time, we note that the combined modules of convolution and self-attention have not received enough attention in terms of acceleration. Therefore, customizing computational units using FPGAs to improve model parallelism is a feasible solution. In this paper, we optimize the parallelism of the combined model of convolution and self-attention, and design algorithm optimization for two of the most complex generic nonlinear functions from the perspective of hardware-software co-optimization to further reduce the hardware complexity and the latency of the whole system, and design the corresponding hardware modules. The design is coded in HDL, a hardware description language, and simulated on a Xilinx FPGA. The experimental results show that the hardware resource consumption of the ZCU216 FPGA-based design is greatly reduced compared to the conventional design, while the throughput is increased by 8.82x and 1.23x compared to the CPU and GPU, respectively.

引用

页码：328 / 342

页数：15

共 50 条

[31] High Performance CNN Accelerators Based on Hardware and Algorithm Co-Optimization
Yuan, Tian
Liu, Weiqiang
Han, Jie
Lombardi, Fabrizio
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2021, 68 (01) : 250 - 263
[32] FPGA-Based Software Profiler for Hardware/Software Co-design
Saad, El-Sayed M.
Awadalla, Medhat H. A.
El-Deen, Kareem Ezz
NRSC: 2009 NATIONAL RADIO SCIENCE CONFERENCE: NRSC 2009, VOLS 1 AND 2, 2009, : 475 - 482
[33] Software profiler for fpga-based hardware/software co-design
Department of Communication, Electronics and Computers, Faculty of Engineering, University of Helwan, Egypt
不详
J Eng Appl Sci, 2009, 1 (59-76):
[34] Reduced order model using convolutional auto-encoder with self-attention
Wu, Pin
Gong, Siquan
Pan, Kaikai
Qiu, Feng
Feng, Weibing
Pain, Christopher
PHYSICS OF FLUIDS, 2021, 33 (07)
[35] A Deep Dilated Convolutional Self-attention Model for Multimodal Human Activity Recognition
Wang, Shengzhi
Xiao, Shuo
Wang, Yu
Jiang, Haifeng
Zhang, Guopeng
2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 791 - 797
[36] ViT Hybrid Channel Fit Pruning Algorithm for Co-optimization of Hardware and Software for Edge Device
Liu, Fang
Li, Heyuan
Chen, Ziyu
Hu, Wei
He, Yanxiang
Wang, Fei
KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT II, KSEM 2024, 2024, 14885 : 333 - 345
[37] A Self-attention Network Based Node Embedding Model
Nguyen, Dai Quoc
Nguyen, Tu Dinh
Phung, Dinh
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2020, PT III, 2021, 12459 : 364 - 377
[38] Software/Hardware Co-Design Optimization for Sparse Convolutional Neural Networks
Hu, Wei
Dong, Yong
Liu, Fang
Jiao, Qiang
2021 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2021, : 2069 - 2074
[39] Software-Hardware Co-Optimization on Partial-Sum Problem for PIM-based Neural Network Accelerator
Wu, Qizhe
Tao, Linfeng
Liang, Huawen
Yuan, Wei
Tian, Teng
Xue, Shuang
Jin, Xi
2021 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2021,
[40] AttenEpilepsy: A 2D convolutional network model based on multi-head self-attention
Ma, Shuang
Wang, Haifeng
Yu, Zhihao
Du, Luyao
Zhang, Ming
Fu, Qingxi
ENGINEERING ANALYSIS WITH BOUNDARY ELEMENTS, 2024, 169

← 1 2 3 4 5 →