ATTENTION PROBE: VISION TRANSFORMER DISTILLATION IN THE WILD

被引:1
|
作者
Wang, Jiahao [1 ]
Cao, Mingdeng [1 ]
Shi, Shuwei [1 ]
Wu, Baoyuan [2 ]
Yang, Yujiu [1 ]
机构
[1] Tsinghua Shenzhen Int Grad Sch, Shenzhen, Peoples R China
[2] Chinese Univ Hong Kong, Shenzhen, Peoples R China
基金
中国国家自然科学基金;
关键词
Transformer; data-free; distillation;
D O I
10.1109/ICASSP43922.2022.9747484
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Vision transformers (ViTs) require intensive computational resources to achieve high performance, which usually makes them not suitable for mobile devices. A feasible strategy is to compress them using the original training data, which may be not accessible due to privacy limitations or transmission restrictions. In this case, utilizing the massive unlabeled data in the wild is an alternative paradigm, which has been proved effective for compressing convolutional neural networks (CNNs). However, due to the significant differences in model structure and computation mechanism between CNNs and ViTs, it is still an open issue that whether the similar paradigm is suitable for ViTs. In this work, we propose to effectively compress ViTs using the unlabeled data in the wild, consisting of two stages. First, we design an effective tool in selecting valuable data from the wild, dubbed Attention Probe. Second, based on the selected data, we develop a probe knowledge distillation algorithm to train a lightweight student transformer, through maximizing the similarities on both the outputs and intermediate features, between the heavy teacher and the lightweight student models. Extensive experimental results on several benchmarks demonstrate that the student transformer obtained by the proposed method can achieve comparable performance with the baseline that requires the original training data. Code is available at: https://github.com/IIGROUP/AttentionProbe.
引用
收藏
页码:2220 / 2224
页数:5
相关论文
共 50 条
  • [31] Attention Guided CAM: Visual Explanations of Vision Transformer Guided by Self-Attention
    Leem, Saebom
    Seo, Hyunseok
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 4, 2024, : 2956 - 2964
  • [32] Abnormality Detection of Blast Furnace Tuyere Based on Knowledge Distillation and a Vision Transformer
    Song, Chuanwang
    Zhang, Hao
    Wang, Yuanjun
    Wang, Yuhui
    Hu, Keyong
    APPLIED SCIENCES-BASEL, 2023, 13 (18):
  • [33] Loop and distillation: Attention weights fusion transformer for fine-grained representation
    Fayou, Sun
    Ngo, Hea Choon
    Meng, Zuqiang
    Sek, Yong Wee
    IET COMPUTER VISION, 2023, 17 (04) : 473 - 482
  • [34] HaViT: Hybrid-Attention Based Vision Transformer for Video Classification
    Li, Li
    Zhuang, Liansheng
    Gao, Shenghua
    Wang, Shafei
    COMPUTER VISION - ACCV 2022, PT IV, 2023, 13844 : 502 - 517
  • [35] ASAFormer: Visual tracking with convolutional vision transformer and asymmetric selective attention
    Gong, Xiaomei
    Zhang, Yi
    Hu, Shu
    KNOWLEDGE-BASED SYSTEMS, 2024, 291
  • [36] CROSSFORMER: A VERSATILE VISION TRANSFORMER HINGING ON CROSS-SCALE ATTENTION
    Wang, Wenxiao
    Yao, Lu
    Chen, Long
    Lin, Binbin
    Cai, Deng
    He, Xiaofei
    Liu, Wei
    ICLR 2022 - 10th International Conference on Learning Representations, 2022,
  • [37] Patch attention convolutional vision transformer for facial expression recognition with occlusion
    Liu, Chang
    Hirota, Kaoru
    Dai, Yaping
    INFORMATION SCIENCES, 2023, 619 : 781 - 794
  • [38] Colorectal Polyp Segmentation Combining Pyramid Vision Transformer and Axial Attention
    Zhou, Xue
    Bai, Zhengyao
    Lu, Qianjie
    Fan, Shenglan
    Computer Engineering and Applications, 2023, 59 (11) : 222 - 230
  • [39] Hierarchical attention vision transformer for fine-grained visual classification
    Hu, Xiaobin
    Zhu, Shining
    Peng, Taile
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2023, 91
  • [40] Facial Expression Recognition Based on Vision Transformer with Hybrid Local Attention
    Tian, Yuan
    Zhu, Jingxuan
    Yao, Huang
    Chen, Di
    APPLIED SCIENCES-BASEL, 2024, 14 (15):