An FPGA-Based Transformer Accelerator With Parallel Unstructured Sparsity Handling for Question-Answering Applications

被引：0

作者：

Cao, Rujian ^{[1
,2
]}

Zhao, Zhongyu ^{[1
,2
]}

Un, Ka-Fai ^{[1
,2
]}

Yu, Wei-Han ^{[1
,2
]}

Martins, Rui P. ^{[1
,2
,3
]}

Mak, Pui-In ^{[1
,2
]}

机构：

[1] Univ Macau, Inst Microelect, State Key Lab Analog & Mixed Signal VLSI, Macau, Peoples R China

[2] Univ Macau, Fac Sci & Technol, ECE, Macau, Peoples R China

[3] Univ Lisbon, Inst Super Tecn, P-1049001 Lisbon, Portugal

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS | 2024年 / 71卷 / 11期

关键词：

Sparse matrices; Computational modeling; Transformers; Hardware; Energy efficiency; Circuits; Throughput; Dataflow; digital accelerator; energy-efficient; field-programmable gate array (FPGA); sparsity; transformer; EFFICIENT;

D O I：

10.1109/TCSII.2024.3462560

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Dataflow management provides limited performance improvement to the transformer model due to its lesser weight reuse than the convolution neural network. The cosFormer reduced computational complexity while achieving comparable performance to the vanilla transformer for natural language processing tasks. However, the unstructured sparsity in the cosFormer makes it a challenge to be implemented efficiently. This brief proposes a parallel unstructured sparsity handling (PUSH) scheme to compute sparse-dense matrix multiplication (SDMM) efficiently. It transforms unstructured sparsity into structured sparsity and reduces the total memory access by balancing the memory accesses of the sparse and dense matrices in the SDMM. We also employ unstructured weight pruning cooperating with PUSH to further increase the structured sparsity of the model. Through verification on an FPGA platform, the proposed accelerator achieves a throughput of 2.82 TOPS and an energy efficiency of 144.8 GOPs/W for HotpotQA dataset with long sequences.

引用

页码：4688 / 4692

页数：5

共 31 条

[31] A 112-765 GOPS/W FPGA-based CNN Accelerator using Importance Map Guided Adaptive Activation Sparsification for Pix2pix Applications
Sun, Wenyu
Tang, Chen
Yuan, Zhuqing
Yuan, Zhe
Yang, Huazhong
Liu, Yongpan
2020 IEEE ASIAN SOLID-STATE CIRCUITS CONFERENCE (A-SSCC), 2020,

← 1 2 3 4 →