Decoupled Multi-teacher Knowledge Distillation based on Entropy

被引:0
|
作者
Cheng, Xin [1 ]
Tang, Jialiang [2 ]
Zhang, Zhiqiang [3 ]
Yu, Wenxin [3 ]
Jiang, Ning [3 ]
Zhou, Jinjia [1 ]
机构
[1] Hosei Univ, Grad Sch Sci & Engn, Tokyo, Japan
[2] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Nanjing, Peoples R China
[3] Southwest Univ Sci & Technol, Sch Comp Sci & Technol, Mianyang, Sichuan, Peoples R China
关键词
Multi-teacher knowledge distillation; image classification; entropy; deep learning;
D O I
10.1109/ISCAS58744.2024.10558141
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Multi-teacher knowledge distillation (MKD) aims to leverage the valuable and diverse knowledge presented by multiple teacher networks to improve the performance of the student network. Existing approaches typically rely on simple methods such as averaging the prediction logits or using sub-optimal weighting strategies to combine knowledge from multiple teachers. However, employing these techniques cannot fully reflect the importance of teachers and may even mislead student's learning. To address these issues, we propose a novel Decoupled Multi teacher Knowledge Distillation based on Entropy (DE-MKD). DE-MKD decomposes the vanilla KD loss and assigns weights to each teacher to reflect its importance based on the entropy of their predictions. Furthermore, we extend the proposed approach to distill the intermediate features from teachers to further improve the performance of the student network. Extensive experiments conducted on the publicly available CIFAR-100 image classification dataset demonstrate the effectiveness and flexibility of our proposed approach.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] DE-MKD: Decoupled Multi-Teacher Knowledge Distillation Based on Entropy
    Cheng, Xin
    Zhang, Zhiqiang
    Weng, Wei
    Yu, Wenxin
    Zhou, Jinjia
    MATHEMATICS, 2024, 12 (11)
  • [2] Anomaly detection based on multi-teacher knowledge distillation
    Ma, Ye
    Jiang, Xu
    Guan, Nan
    Yi, Wang
    JOURNAL OF SYSTEMS ARCHITECTURE, 2023, 138
  • [3] Reinforced Multi-Teacher Selection for Knowledge Distillation
    Yuan, Fei
    Shou, Linjun
    Pei, Jian
    Lin, Wutao
    Gong, Ming
    Fu, Yan
    Jiang, Daxin
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 14284 - 14291
  • [4] Correlation Guided Multi-teacher Knowledge Distillation
    Shi, Luyao
    Jiang, Ning
    Tang, Jialiang
    Huang, Xinlei
    NEURAL INFORMATION PROCESSING, ICONIP 2023, PT IV, 2024, 14450 : 562 - 574
  • [5] Knowledge Distillation via Multi-Teacher Feature Ensemble
    Ye, Xin
    Jiang, Rongxin
    Tian, Xiang
    Zhang, Rui
    Chen, Yaowu
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 566 - 570
  • [6] CONFIDENCE-AWARE MULTI-TEACHER KNOWLEDGE DISTILLATION
    Zhang, Hailin
    Chen, Defang
    Wang, Can
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4498 - 4502
  • [7] Adaptive multi-teacher multi-level knowledge distillation
    Liu, Yuang
    Zhang, Wei
    Wang, Jun
    NEUROCOMPUTING, 2020, 415 : 106 - 113
  • [8] Knowledge Distillation via Multi-Teacher Feature Ensemble
    Ye, Xin
    Jiang, Rongxin
    Tian, Xiang
    Zhang, Rui
    Chen, Yaowu
    IEEE Signal Processing Letters, 2024, 31 : 566 - 570
  • [9] Adaptive multi-teacher multi-level knowledge distillation
    Liu, Yuang
    Zhang, Wei
    Wang, Jun
    Neurocomputing, 2021, 415 : 106 - 113
  • [10] Robust Semantic Segmentation With Multi-Teacher Knowledge Distillation
    Amirkhani, Abdollah
    Khosravian, Amir
    Masih-Tehrani, Masoud
    Kashiani, Hossein
    IEEE ACCESS, 2021, 9 : 119049 - 119066