Accelerating Transformers with Fourier-Based Attention for Efficient On-Device Inference

被引:0
|
作者
Jo, Hyeonjin [1 ]
Sim, Chaerin [2 ]
Park, Jaewoo [1 ]
Lee, Jongeun [3 ]
机构
[1] Ulsan Natl Inst Sci & Technol UNIST, Dept CSE, Ulsan, South Korea
[2] Ulsan Natl Inst Sci & Technol UNIST, Sch New UNIStars, Ulsan, South Korea
[3] Ulsan Natl Inst Sci & Technol UNIST, Dept EE, Ulsan, South Korea
关键词
Natural Language Processing; FPGA; Fourier Transform; Multi-head Attention;
D O I
10.1109/ISOCC59558.2023.10396620
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Multi-head attention based transformers have achieved significant success in various natural language processing applications. However, their quadratic computation complexity and low arithmetic intensity present challenges for inference acceleration. To address this issue, attention mechanisms based on Fourier transforms have been proposed. Nevertheless, the acceleration of complex arithmetic involved in Fourier transform on systolic array based edge devices remains unexplored. In this paper, we analyze the inference of transformers on VTA, a tensor accelerator designed for mobile devices, and propose an efficient mapping of Fourier based attention on VTA. Our experimental results demonstrate that on-device inference of Fourier based attention can improve inference latency up to 70.7% and 29.6% on average compared to Multi-head attention.
引用
收藏
页码:203 / 204
页数:2
相关论文
共 47 条
  • [41] GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy Efficient Inference
    Zadeh, Ali Hadi
    Edo, Isak
    Awad, Omar Mohamed
    Moshovos, Andreas
    [J]. 2020 53RD ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO 2020), 2020, : 811 - 824
  • [42] Alias subtraction more efficient than conventional zero-padding in the Fourier-based calculation of the susceptibility induced perturbation of the magnetic field in MR
    Bouwman, Job G.
    Bakker, Chris J. G.
    [J]. MAGNETIC RESONANCE IN MEDICINE, 2012, 68 (02) : 621 - 630
  • [43] BloodNet: An attention -based deep network for accurate, efficient, and costless bloodstain time since deposition inference
    Li, Huiyu
    Shen, Chen
    Wang, Gongji
    Sun, Qinru
    Yu, Kai
    Li, Zefeng
    Liang, XingGong
    Chen, Run
    Wu, Hao
    Wang, Fan
    Wang, Zhenyuan
    Lian, Chunfeng
    [J]. BRIEFINGS IN BIOINFORMATICS, 2023, 24 (01)
  • [44] PF-Training: Parameter Freezing for Efficient On-Device Training of CNN-based Object Detectors in Low-Resource Environments
    Chun, Dayoung
    Lee, Hyuk-Jae
    Kim, Hyun
    [J]. 2024 IEEE 6TH INTERNATIONAL CONFERENCE ON AI CIRCUITS AND SYSTEMS, AICAS 2024, 2024, : 21 - 25
  • [45] MCWS-Transformers: Towards an Efficient Modeling of Protein Sequences via Multi Context-Window Based Scaled Self-Attention
    Ranjan, Ashish
    Fahad, Md Shah
    Fernandez-Baca, David
    Tripathi, Sudhakar
    Deepak, Akshay
    [J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2023, 20 (02) : 1188 - 1199
  • [46] A 28nm Energy-Area-Efficient Row-based pipelined Training Accelerator with Mixed FXP4/FP16 for On-Device Transfer Learning
    Lu, Wei
    Pei, Han-Hsiang
    Yu, Jheng-Rong
    Chen, Hung-Ming
    Huang, Po-Tsang
    [J]. 2024 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS 2024, 2024,
  • [47] COVID-Attention: Efficient COVID19 Detection Using Pre-trained Deep Models Based on Vision Transformers and X-ray Images
    Haouli, Imed-Eddine
    Hariri, Walid
    Seridi-Bouchelaghem, Hassina
    [J]. INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2023, 32 (08)