ATTENTION PROBE: VISION TRANSFORMER DISTILLATION IN THE WILD

被引:1
|
作者
Wang, Jiahao [1 ]
Cao, Mingdeng [1 ]
Shi, Shuwei [1 ]
Wu, Baoyuan [2 ]
Yang, Yujiu [1 ]
机构
[1] Tsinghua Shenzhen Int Grad Sch, Shenzhen, Peoples R China
[2] Chinese Univ Hong Kong, Shenzhen, Peoples R China
基金
中国国家自然科学基金;
关键词
Transformer; data-free; distillation;
D O I
10.1109/ICASSP43922.2022.9747484
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Vision transformers (ViTs) require intensive computational resources to achieve high performance, which usually makes them not suitable for mobile devices. A feasible strategy is to compress them using the original training data, which may be not accessible due to privacy limitations or transmission restrictions. In this case, utilizing the massive unlabeled data in the wild is an alternative paradigm, which has been proved effective for compressing convolutional neural networks (CNNs). However, due to the significant differences in model structure and computation mechanism between CNNs and ViTs, it is still an open issue that whether the similar paradigm is suitable for ViTs. In this work, we propose to effectively compress ViTs using the unlabeled data in the wild, consisting of two stages. First, we design an effective tool in selecting valuable data from the wild, dubbed Attention Probe. Second, based on the selected data, we develop a probe knowledge distillation algorithm to train a lightweight student transformer, through maximizing the similarities on both the outputs and intermediate features, between the heavy teacher and the lightweight student models. Extensive experimental results on several benchmarks demonstrate that the student transformer obtained by the proposed method can achieve comparable performance with the baseline that requires the original training data. Code is available at: https://github.com/IIGROUP/AttentionProbe.
引用
收藏
页码:2220 / 2224
页数:5
相关论文
共 50 条
  • [41] ASAFormer: Visual tracking with convolutional vision transformer and asymmetric selective attention
    Gong, Xiaomei
    Zhang, Yi
    Hu, Shu
    Knowledge-Based Systems, 2024, 291
  • [42] Lightweight Vision Transformer with Spatial and Channel Enhanced Self-Attention
    Zheng, Jiahao
    Yang, Longqi
    Li, Yiying
    Yang, Ke
    Wang, Zhiyuan
    Zhou, Jun
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 1484 - 1488
  • [43] DHFormer: A Vision Transformer-Based Attention Module for Image Dehazing
    Wasi, Abdul
    Shiney, O. Jeba
    COMPUTER VISION AND IMAGE PROCESSING, CVIP 2023, PT I, 2024, 2009 : 148 - 159
  • [44] Vision transformer distillation for enhanced gastrointestinal abnormality recognition in wireless capsule endoscopy images
    Oukdach, Yassine
    Garbaz, Anass
    Kerkaou, Zakaria
    El Ansari, Mohamed
    Koutti, Lahcen
    Papachrysos, Nikolaos
    El Ouafdi, Ahmed Fouad
    de Lange, Thomas
    Distante, Cosimo
    JOURNAL OF MEDICAL IMAGING, 2025, 12 (01)
  • [45] Attention to Emotions: Body Emotion Recognition In-the-Wild Using Self-attention Transformer Network
    Paiva, Pedro V. V.
    Ramos, Josue J. G.
    Gavrilova, Marina
    Carvalho, Marco A. G.
    COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS, VISIGRAPP 2023, 2024, 2103 : 206 - 228
  • [46] A robust vision transformer-based approach for classification of labeled rices in the wild
    Ulukaya, Sezer
    Deari, Sabri
    COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2025, 231
  • [47] Script Identification in the Wild with FFT-Multi-grained Mix Attention Transformer
    Pan, Zhi
    Yang, Yaowei
    Ubul, Kurban
    Aysa, Alimjan
    DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT II, 2024, 14805 : 104 - 117
  • [48] EViT: An Eagle Vision Transformer With Bi-Fovea Self-Attention
    Shi, Yulong
    Sun, Mingwei
    Wang, Yongshuai
    Ma, Jiahao
    Chen, Zengqiang
    IEEE TRANSACTIONS ON CYBERNETICS, 2025, 55 (03) : 1288 - 1300
  • [49] A novel twin vision transformer framework for crop disease classification with deformable attention
    Padshetty, Smitha
    Ambika
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2025, 105
  • [50] A novel hybrid attention gate based on vision transformer for the detection of surface defects
    Uzen, Hueseyin
    Turkoglu, Muammer
    Ozturk, Dursun
    Hanbay, Davut
    SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (10) : 6835 - 6851