ATTENTION PROBE: VISION TRANSFORMER DISTILLATION IN THE WILD

被引:1
|
作者
Wang, Jiahao [1 ]
Cao, Mingdeng [1 ]
Shi, Shuwei [1 ]
Wu, Baoyuan [2 ]
Yang, Yujiu [1 ]
机构
[1] Tsinghua Shenzhen Int Grad Sch, Shenzhen, Peoples R China
[2] Chinese Univ Hong Kong, Shenzhen, Peoples R China
基金
中国国家自然科学基金;
关键词
Transformer; data-free; distillation;
D O I
10.1109/ICASSP43922.2022.9747484
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Vision transformers (ViTs) require intensive computational resources to achieve high performance, which usually makes them not suitable for mobile devices. A feasible strategy is to compress them using the original training data, which may be not accessible due to privacy limitations or transmission restrictions. In this case, utilizing the massive unlabeled data in the wild is an alternative paradigm, which has been proved effective for compressing convolutional neural networks (CNNs). However, due to the significant differences in model structure and computation mechanism between CNNs and ViTs, it is still an open issue that whether the similar paradigm is suitable for ViTs. In this work, we propose to effectively compress ViTs using the unlabeled data in the wild, consisting of two stages. First, we design an effective tool in selecting valuable data from the wild, dubbed Attention Probe. Second, based on the selected data, we develop a probe knowledge distillation algorithm to train a lightweight student transformer, through maximizing the similarities on both the outputs and intermediate features, between the heavy teacher and the lightweight student models. Extensive experimental results on several benchmarks demonstrate that the student transformer obtained by the proposed method can achieve comparable performance with the baseline that requires the original training data. Code is available at: https://github.com/IIGROUP/AttentionProbe.
引用
收藏
页码:2220 / 2224
页数:5
相关论文
共 50 条
  • [1] Vision Transformer With Quadrangle Attention
    Zhang, Qiming
    Zhang, Jing
    Xu, Yufei
    Tao, Dacheng
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (05) : 3608 - 3624
  • [2] Vision Transformer with Deformable Attention
    Xia, Zhuofan
    Pan, Xuran
    Song, Shiji
    Li, Li Erran
    Huang, Gao
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 4784 - 4793
  • [3] Adder Attention for Vision Transformer
    Shu, Han
    Wang, Jiahao
    Chen, Hanting
    Li, Lin
    Yang, Yujiu
    Wang, Yunhe
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [4] CoAtFormer: Vision Transformer with Composite Attention
    Chang, Zhiyong
    Yin, Mingjun
    Wang, Yan
    PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 614 - 622
  • [5] FLatten Transformer: Vision Transformer using Focused Linear Attention
    Han, Dongchen
    Pan, Xuran
    Han, Yizeng
    Song, Shiji
    Huang, Gao
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 5938 - 5948
  • [6] Dense Attention: A Densely Connected Attention Mechanism for Vision Transformer
    Li, Nannan
    Chen, Yaran
    Zhao, Dongbin
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [7] Fast Vision Transformer via Additive Attention
    Wen, Yang
    Chen, Samuel
    Shrestha, Abhishek Krishna
    2024 IEEE CONFERENCE ON ARTIFICIAL INTELLIGENCE, CAI 2024, 2024, : 573 - 574
  • [8] Couplformer: Rethinking Vision Transformer with Coupling Attention
    Lan, Hai
    Wang, Xihao
    Shen, Hao
    Liang, Peidong
    Wei, Xian
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 6464 - 6473
  • [9] Hydrophobicity-Based Grading of Industrial Composite Insulators Images Using Cross Attention Vision Transformer With Knowledge Distillation
    Das, Samiran
    Chatterjee, Sujoy
    Basu, Mainak
    IEEE TRANSACTIONS ON DIELECTRICS AND ELECTRICAL INSULATION, 2024, 31 (01) : 523 - 532
  • [10] Vision Transformer Quantization with Multi-Step Knowledge Distillation
    Ranjan, Navin
    Savakis, Andreas
    SIGNAL PROCESSING, SENSOR/INFORMATION FUSION, AND TARGET RECOGNITION XXXIII, 2024, 13057