Accelerating Transformers with Fourier-Based Attention for Efficient On-Device Inference

被引：0

作者：

Jo, Hyeonjin ^{[1
]}

Sim, Chaerin ^{[2
]}

Park, Jaewoo ^{[1
]}

Lee, Jongeun ^{[3
]}

机构：

[1] Ulsan Natl Inst Sci & Technol UNIST, Dept CSE, Ulsan, South Korea

[2] Ulsan Natl Inst Sci & Technol UNIST, Sch New UNIStars, Ulsan, South Korea

[3] Ulsan Natl Inst Sci & Technol UNIST, Dept EE, Ulsan, South Korea

来源：

2023 20TH INTERNATIONAL SOC DESIGN CONFERENCE, ISOCC | 2023年

关键词：

Natural Language Processing; FPGA; Fourier Transform; Multi-head Attention;

D O I：

10.1109/ISOCC59558.2023.10396620

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Multi-head attention based transformers have achieved significant success in various natural language processing applications. However, their quadratic computation complexity and low arithmetic intensity present challenges for inference acceleration. To address this issue, attention mechanisms based on Fourier transforms have been proposed. Nevertheless, the acceleration of complex arithmetic involved in Fourier transform on systolic array based edge devices remains unexplored. In this paper, we analyze the inference of transformers on VTA, a tensor accelerator designed for mobile devices, and propose an efficient mapping of Fourier based attention on VTA. Our experimental results demonstrate that on-device inference of Fourier based attention can improve inference latency up to 70.7% and 29.6% on average compared to Multi-head attention.

引用

页码：203 / 204

页数：2

共 47 条

[41] GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy Efficient Inference
Zadeh, Ali Hadi
Edo, Isak
Awad, Omar Mohamed
Moshovos, Andreas
[J]. 2020 53RD ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO 2020), 2020, : 811 - 824
[42] Alias subtraction more efficient than conventional zero-padding in the Fourier-based calculation of the susceptibility induced perturbation of the magnetic field in MR
Bouwman, Job G.
Bakker, Chris J. G.
[J]. MAGNETIC RESONANCE IN MEDICINE, 2012, 68 (02) : 621 - 630
[43] BloodNet: An attention -based deep network for accurate, efficient, and costless bloodstain time since deposition inference
Li, Huiyu
Shen, Chen
Wang, Gongji
Sun, Qinru
Yu, Kai
Li, Zefeng
Liang, XingGong
Chen, Run
Wu, Hao
Wang, Fan
Wang, Zhenyuan
Lian, Chunfeng
[J]. BRIEFINGS IN BIOINFORMATICS, 2023, 24 (01)
[44] PF-Training: Parameter Freezing for Efficient On-Device Training of CNN-based Object Detectors in Low-Resource Environments
Chun, Dayoung
Lee, Hyuk-Jae
Kim, Hyun
[J]. 2024 IEEE 6TH INTERNATIONAL CONFERENCE ON AI CIRCUITS AND SYSTEMS, AICAS 2024, 2024, : 21 - 25
[45] MCWS-Transformers: Towards an Efficient Modeling of Protein Sequences via Multi Context-Window Based Scaled Self-Attention
Ranjan, Ashish
Fahad, Md Shah
Fernandez-Baca, David
Tripathi, Sudhakar
Deepak, Akshay
[J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2023, 20 (02) : 1188 - 1199
[46] A 28nm Energy-Area-Efficient Row-based pipelined Training Accelerator with Mixed FXP4/FP16 for On-Device Transfer Learning
Lu, Wei
Pei, Han-Hsiang
Yu, Jheng-Rong
Chen, Hung-Ming
Huang, Po-Tsang
[J]. 2024 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS 2024, 2024,
[47] COVID-Attention: Efficient COVID19 Detection Using Pre-trained Deep Models Based on Vision Transformers and X-ray Images
Haouli, Imed-Eddine
Hariri, Walid
Seridi-Bouchelaghem, Hassina
[J]. INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2023, 32 (08)

← 1 2 3 4 5 →