Knowledge Distillation from A Stronger Teacher

被引:0
|
作者
Huang, Tao [1 ,2 ]
You, Shan [1 ]
Wang, Fei [3 ]
Qian, Chen [1 ]
Xu, Chang [2 ]
机构
[1] SenseTime Res, Hong Kong, Peoples R China
[2] Univ Sydney, Fac Engn, Sch Comp Sci, Sydney, NSW, Australia
[3] Univ Sci & Technol China, Hefei, Anhui, Peoples R China
基金
澳大利亚研究理事会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Unlike existing knowledge distillation methods focus on the baseline settings, where the teacher models and training strategies are not that strong and competing as state-of-the-art approaches, this paper presents a method dubbed DIST to distill better from a stronger teacher. We empirically find that the discrepancy of predictions between the student and a stronger teacher may tend to be fairly severer. As a result, the exact match of predictions in KL divergence would disturb the training and make existing methods perform poorly. In this paper, we show that simply preserving the relations between the predictions of teacher and student would suffice, and propose a correlation-based loss to capture the intrinsic inter-class relations from the teacher explicitly. Besides, considering that different instances have different semantic similarities to each class, we also extend this relational match to the intra-class level. Our method is simple yet practical, and extensive experiments demonstrate that it adapts well to various architectures, model sizes and training strategies, and can achieve state-of-the-art performance consistently on image classification, object detection, and semantic segmentation tasks. Code is available at: https://github.com/hunto/DIST_KD.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Temperature Annealing Knowledge Distillation from Averaged Teacher
    Gu, Xiaozhe
    Zhang, Zixun
    Luo, Tao
    [J]. 2022 IEEE 42ND INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS WORKSHOPS (ICDCSW), 2022, : 133 - 138
  • [2] Knowledge Distillation with the Reused Teacher Classifier
    Chen, Defang
    Mei, Jian-Ping
    Zhang, Hailin
    Wang, Can
    Feng, Yan
    Chen, Chun
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 11923 - 11932
  • [3] Improving Knowledge Distillation With a Customized Teacher
    Tan, Chao
    Liu, Jie
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (02) : 2290 - 2299
  • [4] Knowledge distillation: A good teacher is patient and consistent
    Beyer, Lucas
    Zhai, Xiaohua
    Royer, Amelie
    Markeeva, Larisa
    Anil, Rohan
    Kolesnikov, Alexander
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 10915 - 10924
  • [5] Knowledge Distillation with a Precise Teacher and Prediction with Abstention
    Xu, Yi
    Pu, Jian
    Zhao, Hui
    [J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 9000 - 9006
  • [6] A Two-Teacher Framework for Knowledge Distillation
    Chen, Xingjian
    Su, Jianbo
    Zhang, Jun
    [J]. ADVANCES IN NEURAL NETWORKS - ISNN 2019, PT I, 2019, 11554 : 58 - 66
  • [7] Improved Knowledge Distillation via Teacher Assistant
    Mirzadeh, Seyed Iman
    Farajtabar, Mehrdad
    Li, Ang
    Levine, Nir
    Matsukawa, Akihiro
    Ghasemzadeh, Hassan
    [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 5191 - 5198
  • [8] Improving knowledge distillation via an expressive teacher
    Tan, Chao
    Liu, Jie
    Zhang, Xiang
    [J]. KNOWLEDGE-BASED SYSTEMS, 2021, 218
  • [9] Learning From Teacher's Failure: A Reflective Learning Paradigm for Knowledge Distillation
    Xu, Kai
    Wang, Lichun
    Xin, Jianjia
    Li, Shuang
    Yin, Baocai
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (01) : 384 - 396
  • [10] Reinforced Multi-Teacher Selection for Knowledge Distillation
    Yuan, Fei
    Shou, Linjun
    Pei, Jian
    Lin, Wutao
    Gong, Ming
    Fu, Yan
    Jiang, Daxin
    [J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 14284 - 14291