Peeling the Onion: Hierarchical Reduction of Data Redundancy for Efficient Vision Transformer Training

被引：0

作者：

Kong, Zhenglun ^{[1
]}

Ma, Haoyu ^{[2
]}

Yuan, Geng ^{[1
]}

Sun, Mengshu ^{[1
]}

Xie, Yanyue ^{[1
]}

Dong, Peiyan ^{[1
]}

Meng, Xin ^{[3
]}

Shen, Xuan ^{[1
]}

Tang, Hao ^{[4
]}

Qin, Minghai ^{[5
]}

Chen, Tianlong ^{[6
]}

Ma, Xiaolong ^{[7
]}

Xie, Xiaohui ^{[2
]}

Wang, Zhangyang ^{[6
]}

Wang, Yanzhi ^{[1
]}

机构：

[1] Northeastern Univ, Boston, MA 02115 USA

[2] Univ Calif Irvine, Irvine, CA USA

[3] Peking Univ, Beijing, Peoples R China

[4] Swiss Fed Inst Technol, CVL, Zurich, Switzerland

[5] Western Digital Res, San Jose, CA USA

[6] Univ Texas Austin, Austin, TX USA

[7] Clemson Univ, Clemson, SC USA

来源：

THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 7 | 2023年

基金：

美国国家科学基金会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Vision transformers (ViTs) have recently obtained success in many applications, but their intensive computation and heavy memory usage at both training and inference time limit their generalization. Previous compression algorithms usually start from the pre-trained dense models and only focus on efficient inference, while time-consuming training is still unavoidable. In contrast, this paper points out that the million-scale training data is redundant, which is the fundamental reason for the tedious training. To address the issue, this paper aims to intro-duce sparsity into data and proposes an end-to-end efficient training framework from three sparse perspectives, dubbed Tri-Level E-ViT. Specifically, we leverage a hierarchical data redundancy reduction scheme by exploring the sparsity under three levels: the number of training examples in the dataset, the number of patches (tokens) in each example, and the num-ber of connections between tokens that lie in attention weights. With extensive experiments, we demonstrate that our proposed technique can noticeably accelerate training for various ViT architectures while maintaining accuracy. Remarkably, under certain ratios, we are able to improve the ViT accuracy rather than compromising it. For example, we can achieve 15.2% speedup with 72.6% (+0.4) Top-1 accuracy on Deit-T, and 15.7% speedup with 79.9% (+0.1) Top-1 accuracy on Deit-S. This proves the existence of data redundancy in ViT. Our code is released at https://github.com/ZLKong/Tri-Level-ViT

引用

页码：8360 / 8368

页数：9

共 26 条

[21] Reduction of the size of the learning data in a probabilistic neural network by hierarchical clustering. Application to the discrimination of seeds by artificial vision
Chtioui, Y
Bertrand, D
Barba, D
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 1996, 35 (02) : 175 - 186
[22] Reduction of the size of the learning data in a probabilistic neural network by hierarchical clustering. Application to the discrimination of seeds by artificial vision
Chemom Intell Lab Syst, 2 (175):
[23] An Efficient Voxel-Based Segmentation Algorithm Based on Hierarchical Clustering to Extract LIDAR Power Equipment Data in Transformer Substations
Guo, Jianlong
Feng, Weixia
Xue, Jiang
Xiong, Shan
Hao, Tengfei
Li, Ruiheng
Mao, Huben
IEEE ACCESS, 2020, 8 : 227482 - 227496
[24] An Efficient CNN Accelerator Achieving High PE Utilization Using a Dense-/Sparse-Aware Redundancy Reduction Method and Data-Index Decoupling Workflow
Meng, Yishuo
Yang, Chen
Xiang, Siwei
Wang, Jianfei
Mei, Kuizhi
Geng, Li
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2023, 31 (10) : 1537 - 1550
[25] Two-level energy-efficient data reduction strategies based on SAX-LZW and hierarchical clustering for minimizing the huge data conveyed on the internet of things networks
Ali Kadhum M. Al-Qurabat
Suha Abdulhussein Abdulzahra
Ali Kadhum Idrees
The Journal of Supercomputing, 2022, 78 : 17844 - 17890
[26] Two-level energy-efficient data reduction strategies based on SAX-LZW and hierarchical clustering for minimizing the huge data conveyed on the internet of things networks
Al-Qurabat, Ali Kadhum M.
Abdulzahra, Suha Abdulhussein
Idrees, Ali Kadhum
JOURNAL OF SUPERCOMPUTING, 2022, 78 (16): : 17844 - 17890

← 1 2 3 →