Accelerating Transformers with Fourier-Based Attention for Efficient On-Device Inference

被引：0

作者：

Jo, Hyeonjin ^{[1
]}

Sim, Chaerin ^{[2
]}

Park, Jaewoo ^{[1
]}

Lee, Jongeun ^{[3
]}

机构：

[1] Ulsan Natl Inst Sci & Technol UNIST, Dept CSE, Ulsan, South Korea

[2] Ulsan Natl Inst Sci & Technol UNIST, Sch New UNIStars, Ulsan, South Korea

[3] Ulsan Natl Inst Sci & Technol UNIST, Dept EE, Ulsan, South Korea

来源：

2023 20TH INTERNATIONAL SOC DESIGN CONFERENCE, ISOCC | 2023年

关键词：

Natural Language Processing; FPGA; Fourier Transform; Multi-head Attention;

D O I：

10.1109/ISOCC59558.2023.10396620

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Multi-head attention based transformers have achieved significant success in various natural language processing applications. However, their quadratic computation complexity and low arithmetic intensity present challenges for inference acceleration. To address this issue, attention mechanisms based on Fourier transforms have been proposed. Nevertheless, the acceleration of complex arithmetic involved in Fourier transform on systolic array based edge devices remains unexplored. In this paper, we analyze the inference of transformers on VTA, a tensor accelerator designed for mobile devices, and propose an efficient mapping of Fourier based attention on VTA. Our experimental results demonstrate that on-device inference of Fourier based attention can improve inference latency up to 70.7% and 29.6% on average compared to Multi-head attention.

引用

页码：203 / 204

页数：2

共 47 条

[1] ProFormer: Towards On-Device LSH Projection Based Transformers
Sankar, Chinnadhurai
Ravi, Sujith
Kozareva, Zornitsa
[J]. 16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 2823 - 2828
[2] On-Device Deep Learning Inference for Efficient Activity Data Collection
Mairittha, Nattaya
Mairittha, Tittaya
Inoue, Sozo
[J]. SENSORS, 2019, 19 (15)
[3] Accelerating on-device DNN inference during service outage through scheduling early exit
Wang, Zizhao
Bao, Wei
Yuan, Dong
Ge, Liming
Tran, Nguyen H.
Zomaya, Albert Y.
[J]. COMPUTER COMMUNICATIONS, 2020, 162 : 69 - 82
[4] Efficient Evaluation of Fourier-Based SAR Focusing Kernels
Prats-Iraola, Pau
Rodriguez-Cassola, Marc
De Zan, Francesco
Lopez-Dekker, Paco
Scheiber, Rolf
Reigber, Andreas
[J]. IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2014, 11 (09) : 1489 - 1493
[5] Deep Partial Updating: Towards Communication Efficient Updating for On-Device Inference
Qu, Zhongnan
Liu, Cong
Thiele, Lothar
[J]. COMPUTER VISION, ECCV 2022, PT XI, 2022, 13671 : 137 - 153
[6] CO-DESIGNING HARDWARE AND MODELS FOR EFFICIENT ON-DEVICE ML INFERENCE
Mattina, Matthew
[J]. 2021 IEEE/ACM INTERNATIONAL SYMPOSIUM ON LOW POWER ELECTRONICS AND DESIGN (ISLPED), 2021,
[7] OnceNAS: Discovering efficient on-device inference neural networks for edge devices
Zhang, Yusen
Qin, Yunchuan
Zhang, Yufeng
Zhou, Xu
Jian, Songlei
Tan, Yusong
Li, Kenli
[J]. INFORMATION SCIENCES, 2024, 669
[8] Efficient Fourier-Based Evaluation of SAR Focusing Kernels
Prats-Iraola, Pau
Rodriguez-Cassola, Marc
De Zan, Francesco
Lopez-Dekker, Paco
Scheiber, Rolf
Reigber, Andreas
[J]. 10TH EUROPEAN CONFERENCE ON SYNTHETIC APERTURE RADAR (EUSAR 2014), 2014,
[9] Efficient On-Device Session-Based Recommendation
Xia, Xin
Yu, Junliang
Wang, Qinyong
Yang, Chaoqun
Nguyen Quoc Viet Hung
Yin, Hongzhi
[J]. ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2023, 41 (04)
[10] ShadowNet: A Secure and Efficient On-device Model Inference System for Convolutional Neural Networks
Sun, Zhichuang
Sun, Ruimin
Liu, Changming
Chowdhury, Amrita Roy
Lu, Long
Jha, Somesh
[J]. 2023 IEEE SYMPOSIUM ON SECURITY AND PRIVACY, SP, 2023, : 1596 - 1612

← 1 2 3 4 5 →