An FPGA-Based Transformer Accelerator With Parallel Unstructured Sparsity Handling for Question-Answering Applications

被引:0
|
作者
Cao, Rujian [1 ,2 ]
Zhao, Zhongyu [1 ,2 ]
Un, Ka-Fai [1 ,2 ]
Yu, Wei-Han [1 ,2 ]
Martins, Rui P. [1 ,2 ,3 ]
Mak, Pui-In [1 ,2 ]
机构
[1] Univ Macau, Inst Microelect, State Key Lab Analog & Mixed Signal VLSI, Macau, Peoples R China
[2] Univ Macau, Fac Sci & Technol, ECE, Macau, Peoples R China
[3] Univ Lisbon, Inst Super Tecn, P-1049001 Lisbon, Portugal
关键词
Sparse matrices; Computational modeling; Transformers; Hardware; Energy efficiency; Circuits; Throughput; Dataflow; digital accelerator; energy-efficient; field-programmable gate array (FPGA); sparsity; transformer; EFFICIENT;
D O I
10.1109/TCSII.2024.3462560
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Dataflow management provides limited performance improvement to the transformer model due to its lesser weight reuse than the convolution neural network. The cosFormer reduced computational complexity while achieving comparable performance to the vanilla transformer for natural language processing tasks. However, the unstructured sparsity in the cosFormer makes it a challenge to be implemented efficiently. This brief proposes a parallel unstructured sparsity handling (PUSH) scheme to compute sparse-dense matrix multiplication (SDMM) efficiently. It transforms unstructured sparsity into structured sparsity and reduces the total memory access by balancing the memory accesses of the sparse and dense matrices in the SDMM. We also employ unstructured weight pruning cooperating with PUSH to further increase the structured sparsity of the model. Through verification on an FPGA platform, the proposed accelerator achieves a throughput of 2.82 TOPS and an energy efficiency of 144.8 GOPs/W for HotpotQA dataset with long sequences.
引用
收藏
页码:4688 / 4692
页数:5
相关论文
共 31 条
  • [31] A 112-765 GOPS/W FPGA-based CNN Accelerator using Importance Map Guided Adaptive Activation Sparsification for Pix2pix Applications
    Sun, Wenyu
    Tang, Chen
    Yuan, Zhuqing
    Yuan, Zhe
    Yang, Huazhong
    Liu, Yongpan
    2020 IEEE ASIAN SOLID-STATE CIRCUITS CONFERENCE (A-SSCC), 2020,