Supervised Masked Knowledge Distillation for Few-Shot Transformers

被引:19
|
作者
Lin, Han [1 ]
Han, Guangxing [1 ]
Ma, Jiawei [1 ]
Huang, Shiyuan [1 ]
Lin, Xudong [1 ]
Chang, Shih-Fu [1 ]
机构
[1] Columbia Univ, New York, NY 10027 USA
关键词
D O I
10.1109/CVPR52729.2023.01882
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Vision Transformers (ViTs) emerge to achieve impressive performance on many data-abundant computer vision tasks by capturing long-range dependencies among local features. However, under few-shot learning (FSL) settings on small datasets with only a few labeled data, ViT tends to overfit and suffers from severe performance degradation due to its absence of CNN-alike inductive bias. Previous works in FSL avoid such problem either through the help of self-supervised auxiliary losses, or through the dextile uses of label information under supervised settings. But the gap between self-supervised and supervised few-shot Transformers is still unfilled. Inspired by recent advances in self-supervised knowledge distillation and masked image modeling (MIM), we propose a novel Supervised Masked Knowledge Distillation model (SMKD) for few-shot Transformers which incorporates label information into self-distillation frameworks. Compared with previous self-supervised methods, we allow intra-class knowledge distillation on both class and patch tokens, and introduce the challenging task of masked patch tokens reconstruction across intra-class images. Experimental results on four few-shot classification benchmark datasets show that our method with simple design outperforms previous methods by a large margin and achieves a new start-of-the-art. Detailed ablation studies confirm the effectiveness of each component of our model. Code for this paper is available here: https://github.com/HL-hanlin/SMKD.
引用
收藏
页码:19649 / 19659
页数:11
相关论文
共 50 条
  • [1] Hierarchical Knowledge Propagation and Distillation for Few-Shot Learning
    Zhou, Chunpeng
    Wang, Haishuai
    Zhou, Sheng
    Yu, Zhi
    Bandara, Danushka
    Bu, Jiajun
    NEURAL NETWORKS, 2023, 167 : 615 - 625
  • [2] Progressive Network Grafting for Few-Shot Knowledge Distillation
    Shen, Chengchao
    Wang, Xinchao
    Yin, Youtan
    Song, Jie
    Luo, Sihui
    Song, Mingli
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 2541 - 2549
  • [3] Black-Box Few-Shot Knowledge Distillation
    Dang Nguyen
    Gupta, Sunil
    Do, Kien
    Venkatesh, Svetha
    COMPUTER VISION, ECCV 2022, PT XXI, 2022, 13681 : 196 - 211
  • [4] Knowledge Distillation Meets Few-Shot Learning: An Approach for Few-Shot Intent Classification Within and Across Domains
    Sauer, Anna
    Asaadi, Shima
    Kuech, Fabian
    PROCEEDINGS OF THE 4TH WORKSHOP ON NLP FOR CONVERSATIONAL AI, 2022, : 108 - 119
  • [5] EKD: Effective Knowledge Distillation for Few-Shot Sentiment Analysis
    Jiang, Kehan
    Cal, Hongtian
    Lv, Yingda
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT VII, 2024, 15022 : 164 - 176
  • [6] Generalized Few-Shot Node Classification With Graph Knowledge Distillation
    Wang, Jialong
    Zhou, Mengting
    Zhang, Shilong
    Gong, Zhiguo
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2024,
  • [7] Few-Shot Learning with Semi-Supervised Transformers for Electronic Health Records
    Poulain, Raphael
    Gupta, Mehak
    Beheshti, Rahmatollah
    MACHINE LEARNING FOR HEALTHCARE CONFERENCE, VOL 182, 2022, 182 : 853 - 873
  • [8] Uncertainty-Guided Semi-Supervised Few-Shot Class-Incremental Learning With Knowledge Distillation
    Cui, Yawen
    Deng, Wanxia
    Xu, Xin
    Liu, Zhen
    Liu, Zhong
    Pietikainen, Matti
    Liu, Li
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 6422 - 6435
  • [9] DisRot: boosting the generalization capability of few-shot learning via knowledge distillation and self-supervised learning
    Ma, Chenyu
    Jia, Jinfang
    Huang, Jianqiang
    Wu, Li
    Wang, Xiaoying
    MACHINE VISION AND APPLICATIONS, 2024, 35 (03)
  • [10] Integrating Knowledge Distillation With Learning to Rank for Few-Shot Scene Classification
    Liu, Yishu
    Zhang, Liqiang
    Han, Zhengzhuo
    Chen, Conghui
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60