Improving knowledge distillation via an expressive teacher

被引:14
|
作者
Tan, Chao [1 ,2 ]
Liu, Jie [1 ,2 ]
Zhang, Xiang [3 ,4 ]
机构
[1] Natl Univ Def Technol, Sci & Parallel & Distributed Proc Lab, Changsha 410073, Peoples R China
[2] Natl Univ Def Technol, Lab Software Engn Complex Syst, Changsha 410073, Peoples R China
[3] Natl Univ Def Technol, Inst Quantum Informat, Changsha 410073, Peoples R China
[4] Natl Univ Def Technol, State Key Lab High Performance Comp, Changsha 410073, Peoples R China
基金
中国国家自然科学基金;
关键词
Neural network compression; Knowledge distillation; Knowledge transfer;
D O I
10.1016/j.knosys.2021.106837
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Knowledge distillation (KD) is a widely used network compression technique for seeking a light student network with similar behaviors to its heavy teacher network. Previous studies mainly focus on training the student to mimic representation space of the teacher. However, how to be a good teacher is rarely explored. We find that if a teacher has weak ability to capture the knowledge underlying the true data in the real world, the student cannot even learn knowledge from its teacher. Inspired by that, we propose an inter-class correlation regularization to train teacher to capture a more explicit correlation among classes. Besides, we enforce student to mimic inter-class correlation of its teacher. Extensive experiments of image classification task have been conducted on four public benchmarks. For example, when the teacher and student networks are ShuffleNetV2-1.0 and ShuffleNetV2-0.5, our proposed method achieves 42.63% top-1 error rate for Tiny ImageNet. (C) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Improving Knowledge Distillation With a Customized Teacher
    Tan, Chao
    Liu, Jie
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (02) : 2290 - 2299
  • [2] Improving knowledge distillation via pseudo-multi-teacher network
    Li, Shunhang
    Shao, Mingwen
    Guo, Zihao
    Zhuang, Xinkai
    [J]. MACHINE VISION AND APPLICATIONS, 2023, 34 (02)
  • [3] Improving knowledge distillation via pseudo-multi-teacher network
    Shunhang Li
    Mingwen Shao
    Zihao Guo
    Xinkai Zhuang
    [J]. Machine Vision and Applications, 2023, 34
  • [4] Improved Knowledge Distillation via Teacher Assistant
    Mirzadeh, Seyed Iman
    Farajtabar, Mehrdad
    Li, Ang
    Levine, Nir
    Matsukawa, Akihiro
    Ghasemzadeh, Hassan
    [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 5191 - 5198
  • [5] Improving Deep Mutual Learning via Knowledge Distillation
    Lukman, Achmad
    Yang, Chuan-Kai
    [J]. APPLIED SCIENCES-BASEL, 2022, 12 (15):
  • [6] Improving Knowledge Distillation via Head and Tail Categories
    Xu, Liuchi
    Ren, Jin
    Huang, Zhenhua
    Zheng, Weishi
    Chen, Yunwen
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (05) : 3465 - 3480
  • [7] Knowledge Distillation via Multi-Teacher Feature Ensemble
    Ye, Xin
    Jiang, Rongxin
    Tian, Xiang
    Zhang, Rui
    Chen, Yaowu
    [J]. IEEE Signal Processing Letters, 2024, 31 : 566 - 570
  • [8] Knowledge Distillation via Multi-Teacher Feature Ensemble
    Ye, Xin
    Jiang, Rongxin
    Tian, Xiang
    Zhang, Rui
    Chen, Yaowu
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 566 - 570
  • [9] PURF: Improving teacher representations by imposing smoothness constraints for knowledge distillation
    Hossain, Md Imtiaz
    Akhter, Sharmen
    Hong, Choong Seon
    Huh, Eui-Nam
    [J]. APPLIED SOFT COMPUTING, 2024, 159
  • [10] Improving neural ordinary differential equations via knowledge distillation
    Chu, Haoyu
    Wei, Shikui
    Lu, Qiming
    Zhao, Yao
    [J]. IET COMPUTER VISION, 2024, 18 (02) : 304 - 314