Accelerating Transformers with Fourier-Based Attention for Efficient On-Device Inference

被引:0
|
作者
Jo, Hyeonjin [1 ]
Sim, Chaerin [2 ]
Park, Jaewoo [1 ]
Lee, Jongeun [3 ]
机构
[1] Ulsan Natl Inst Sci & Technol UNIST, Dept CSE, Ulsan, South Korea
[2] Ulsan Natl Inst Sci & Technol UNIST, Sch New UNIStars, Ulsan, South Korea
[3] Ulsan Natl Inst Sci & Technol UNIST, Dept EE, Ulsan, South Korea
关键词
Natural Language Processing; FPGA; Fourier Transform; Multi-head Attention;
D O I
10.1109/ISOCC59558.2023.10396620
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Multi-head attention based transformers have achieved significant success in various natural language processing applications. However, their quadratic computation complexity and low arithmetic intensity present challenges for inference acceleration. To address this issue, attention mechanisms based on Fourier transforms have been proposed. Nevertheless, the acceleration of complex arithmetic involved in Fourier transform on systolic array based edge devices remains unexplored. In this paper, we analyze the inference of transformers on VTA, a tensor accelerator designed for mobile devices, and propose an efficient mapping of Fourier based attention on VTA. Our experimental results demonstrate that on-device inference of Fourier based attention can improve inference latency up to 70.7% and 29.6% on average compared to Multi-head attention.
引用
收藏
页码:203 / 204
页数:2
相关论文
共 47 条
  • [1] ProFormer: Towards On-Device LSH Projection Based Transformers
    Sankar, Chinnadhurai
    Ravi, Sujith
    Kozareva, Zornitsa
    [J]. 16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 2823 - 2828
  • [2] On-Device Deep Learning Inference for Efficient Activity Data Collection
    Mairittha, Nattaya
    Mairittha, Tittaya
    Inoue, Sozo
    [J]. SENSORS, 2019, 19 (15)
  • [3] Accelerating on-device DNN inference during service outage through scheduling early exit
    Wang, Zizhao
    Bao, Wei
    Yuan, Dong
    Ge, Liming
    Tran, Nguyen H.
    Zomaya, Albert Y.
    [J]. COMPUTER COMMUNICATIONS, 2020, 162 : 69 - 82
  • [4] Efficient Evaluation of Fourier-Based SAR Focusing Kernels
    Prats-Iraola, Pau
    Rodriguez-Cassola, Marc
    De Zan, Francesco
    Lopez-Dekker, Paco
    Scheiber, Rolf
    Reigber, Andreas
    [J]. IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2014, 11 (09) : 1489 - 1493
  • [5] Deep Partial Updating: Towards Communication Efficient Updating for On-Device Inference
    Qu, Zhongnan
    Liu, Cong
    Thiele, Lothar
    [J]. COMPUTER VISION, ECCV 2022, PT XI, 2022, 13671 : 137 - 153
  • [6] CO-DESIGNING HARDWARE AND MODELS FOR EFFICIENT ON-DEVICE ML INFERENCE
    Mattina, Matthew
    [J]. 2021 IEEE/ACM INTERNATIONAL SYMPOSIUM ON LOW POWER ELECTRONICS AND DESIGN (ISLPED), 2021,
  • [7] OnceNAS: Discovering efficient on-device inference neural networks for edge devices
    Zhang, Yusen
    Qin, Yunchuan
    Zhang, Yufeng
    Zhou, Xu
    Jian, Songlei
    Tan, Yusong
    Li, Kenli
    [J]. INFORMATION SCIENCES, 2024, 669
  • [8] Efficient Fourier-Based Evaluation of SAR Focusing Kernels
    Prats-Iraola, Pau
    Rodriguez-Cassola, Marc
    De Zan, Francesco
    Lopez-Dekker, Paco
    Scheiber, Rolf
    Reigber, Andreas
    [J]. 10TH EUROPEAN CONFERENCE ON SYNTHETIC APERTURE RADAR (EUSAR 2014), 2014,
  • [9] Efficient On-Device Session-Based Recommendation
    Xia, Xin
    Yu, Junliang
    Wang, Qinyong
    Yang, Chaoqun
    Nguyen Quoc Viet Hung
    Yin, Hongzhi
    [J]. ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2023, 41 (04)
  • [10] ShadowNet: A Secure and Efficient On-device Model Inference System for Convolutional Neural Networks
    Sun, Zhichuang
    Sun, Ruimin
    Liu, Changming
    Chowdhury, Amrita Roy
    Lu, Long
    Jha, Somesh
    [J]. 2023 IEEE SYMPOSIUM ON SECURITY AND PRIVACY, SP, 2023, : 1596 - 1612