HeatViT: Hardware-Efficient Adaptive Token Pruning for Vision Transformers

被引：25

作者：

Dong, Peiyan ^{[1
]}

Sun, Mengshu ^{[1
]}

Lu, Alec ^{[2
]}

Xie, Yanyue ^{[1
]}

Liu, Kenneth ^{[2
]}

Kong, Zhenglun ^{[1
]}

Meng, Xin ^{[1
]}

Li, Zhengang ^{[1
]}

Lin, Xue ^{[1
]}

Fang, Zhenman ^{[2
]}

Wang, Yanzhi ^{[1
]}

机构：

[1] Northeastern Univ, Boston, MA 02115 USA

[2] Simon Fraser Univ, Burnaby, BC, Canada

来源：

2023 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, HPCA | 2023年

基金：

加拿大自然科学与工程研究理事会;

关键词：

Vision Transformer; FPGA Accelerator; Hardware and Software Co-design; Data-level Sparsity;

D O I：

10.1109/HPCA56546.2023.10071047

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

While vision transformers (ViTs) have continuously achieved new milestones in the field of computer vision, their sophisticated network architectures with high computation and memory costs have impeded their deployment on resource-limited edge devices. In this paper, we propose a hardware-efficient image-adaptive token pruning framework called HeatViT for efficient yet accurate ViT acceleration on embedded FPGAs. Based on the inherent computational patterns in ViTs, we first adopt an effective, hardware-efficient, and learnable head-evaluation token selector, which can be progressively inserted before transformer blocks to dynamically identify and consolidate the non-informative tokens from input images. Moreover, we implement the token selector on hardware by adding miniature control logic to heavily reuse existing hardware components built for the backbone ViT. To improve the hardware efficiency, we further employ 8-bit fixed-point quantization and propose polynomial approximations with regularization effect on quantization error for the frequently used nonlinear functions in ViTs. Compared to existing ViT pruning studies, under the similar computation cost, HeatViT can achieve 0.7%similar to 8.9% higher accuracy; while under the similar model accuracy, HeatViT can achieve more than 28.4%similar to 65.3% computation reduction, for various widely used ViTs, including DeiT-T, DeiT-S, DeiT-B, LV-ViT-S, and LV-ViT-M, on the ImageNet dataset. Compared to the baseline hardware accelerator, our implementations of HeatViT on the Xilinx ZCU102 FPGA achieve 3.46x similar to 4.89x speedup with a trivial resource utilization overhead of 8%similar to 11% more DSPs and 5%similar to 8% more LUTs.

引用

页码：442 / 455

页数：14

共 50 条

[1] Adaptive Token Sampling for Efficient Vision Transformers
Fayyaz, Mohsen
Koohpayegani, Soroush Abbasi
Jafari, Farnoush Rezaei
Sengupta, Sunando
Joze, Hamid Reza Vaezi
Sommerlade, Eric
Pirsiavash, Hamed
Gall, Juergen
COMPUTER VISION, ECCV 2022, PT XI, 2022, 13671 : 396 - 414
[2] Dynamic Token Pruning in Plain Vision Transformers for Semantic Segmentation
Tang, Quan
Zhang, Bowen
Liu, Jiajun
Liu, Fagui
Liu, Yifan
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 777 - 786
[3] An Attention-Based Token Pruning Method for Vision Transformers
Luo, Kaicheng
Li, Huaxiong
Zhou, Xianzhong
Huang, Bing
ROUGH SETS, IJCRS 2022, 2022, 13633 : 274 - 288
[4] Learned Token Pruning for Transformers
Kim, Sehoon
Shen, Sheng
Thorsley, David
Gholami, Amir
Kwon, Woosuk
Hassoun, Joseph
Keutzer, Kurt
PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 784 - 794
[5] DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification
Rao, Yongming
Zhao, Wenliang
Liu, Benlin
Lu, Jiwen
Zhou, Jie
Hsieh, Cho-Jui
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[6] Joint Token Pruning and Squeezing Towards More Aggressive Compression of Vision Transformers
Wei, Siyuan
Ye, Tianzhu
Zhang, Shen
Tang, Yao
Liang, Jiajun
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2092 - 2101
[7] Making Vision Transformers Efficient from A Token Sparsification View
Chang, Shuning
Wang, Pichao
Lin, Ming
Wang, Fan
Zhang, David Junhao
Jin, Rong
Shou, Mike Zheng
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 6195 - 6205
[8] ALGM: Adaptive Local-then-Global Token Merging for Efficient Semantic Segmentation with Plain Vision Transformers
Norouzi, Narges
Orlova, Svetlana
de Geus, Daan
Dubbelman, Gijs
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 15773 - 15782
[9] Make a Long Image Short: Adaptive Token Length for Vision Transformers
Zhou, Qiqi
Zhu, Yichen
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: RESEARCH TRACK, ECML PKDD 2023, PT II, 2023, 14170 : 69 - 85
[10] Hardware-efficient color correlation-adaptive demosaicing with multifiltering
Lee, Seung Hyun
Choi, Dong Yoon
Song, Byung Cheol
JOURNAL OF ELECTRONIC IMAGING, 2019, 28 (01)

← 1 2 3 4 5 →