Multi-teacher knowledge distillation based on joint Guidance of Probe and Adaptive Corrector

被引:5
|
作者
Shang, Ronghua [1 ]
Li, Wenzheng [2 ]
Zhu, Songling [1 ]
Jiao, Licheng [1 ]
Li, Yangyang [1 ]
机构
[1] Xidian Univ, Sch Artificial Intelligence, Key Lab Intelligent Percept & Image Understanding, Minist Educ, Xian, Shaanxi, Peoples R China
[2] Xidian Univ, Guangzhou Inst Technol, Key Lab Intelligent Percept & Image Understanding, Minist Educ, Guangzhou, Guangdong, Peoples R China
基金
中国国家自然科学基金;
关键词
Knowledge distillation; Linear classifier probes; Convolutional neural networks; Spail attention; Model compression; NEURAL-NETWORKS; MODEL;
D O I
10.1016/j.neunet.2023.04.015
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Knowledge distillation (KD) has been widely used in model compression. But, in the current multi -teacher KD algorithms, the student can only passively acquire the knowledge of the teacher's middle layer in a single form and all teachers use identical a guiding scheme to the student. To solve these problems, this paper proposes a multi-teacher KD based on joint Guidance of Probe and Adaptive Corrector (GPAC) method. First, GPAC proposes a teacher selection strategy guided by the Linear Classifier Probe (LCP). This strategy allows the student to select better teachers in the middle layer. Teachers are evaluated using the classification accuracy detected by LCP. Then, GPAC designs an adaptive multi-teacher instruction mechanism. The mechanism uses instructional weights to emphasize the student's predicted direction and reduce the student's difficulty learning from teachers. At the same time, every teacher can formulate guiding scheme according to the Kullback- Leibler divergence loss of the student and itself. Finally, GPAC develops a multi-level mechanism for adjusting spatial attention loss. this mechanism uses a piecewise function that varies with the number of epochs to adjust the spatial attention loss. This piecewise function classifies the student' learning about spatial attention into three levels, which can efficiently use spatial attention of teachers. GPAC and the current state-of-the-art distillation methods are tested on CIFAR-10 and CIFAR-100 datasets. The experimental results demonstrate that the proposed method in this paper can obtain higher classification accuracy. (c) 2023 Elsevier Ltd. All rights reserved.
引用
下载
收藏
页码:345 / 356
页数:12
相关论文
共 50 条
  • [1] Adaptive multi-teacher multi-level knowledge distillation
    Liu, Yuang
    Zhang, Wei
    Wang, Jun
    NEUROCOMPUTING, 2020, 415 : 106 - 113
  • [2] Adaptive multi-teacher multi-level knowledge distillation
    Liu, Yuang
    Zhang, Wei
    Wang, Jun
    Neurocomputing, 2021, 415 : 106 - 113
  • [3] Adaptive Multi-Teacher Knowledge Distillation with Meta-Learning
    Zhang, Hailin
    Chen, Defang
    Wang, Can
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 1943 - 1948
  • [4] ATMKD: adaptive temperature guided multi-teacher knowledge distillation
    Lin, Yu-e
    Yin, Shuting
    Ding, Yifeng
    Liang, Xingzhu
    MULTIMEDIA SYSTEMS, 2024, 30 (05)
  • [5] Decoupled Multi-teacher Knowledge Distillation based on Entropy
    Cheng, Xin
    Tang, Jialiang
    Zhang, Zhiqiang
    Yu, Wenxin
    Jiang, Ning
    Zhou, Jinjia
    2024 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS 2024, 2024,
  • [6] Anomaly detection based on multi-teacher knowledge distillation
    Ma, Ye
    Jiang, Xu
    Guan, Nan
    Yi, Wang
    JOURNAL OF SYSTEMS ARCHITECTURE, 2023, 138
  • [7] Reinforced Multi-Teacher Selection for Knowledge Distillation
    Yuan, Fei
    Shou, Linjun
    Pei, Jian
    Lin, Wutao
    Gong, Ming
    Fu, Yan
    Jiang, Daxin
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 14284 - 14291
  • [8] Correlation Guided Multi-teacher Knowledge Distillation
    Shi, Luyao
    Jiang, Ning
    Tang, Jialiang
    Huang, Xinlei
    NEURAL INFORMATION PROCESSING, ICONIP 2023, PT IV, 2024, 14450 : 562 - 574
  • [9] Cross-View Gait Recognition Method Based on Multi-Teacher Joint Knowledge Distillation
    Li, Ruoyu
    Yun, Lijun
    Zhang, Mingxuan
    Yang, Yanchen
    Cheng, Feiyan
    SENSORS, 2023, 23 (22)
  • [10] Knowledge Distillation via Multi-Teacher Feature Ensemble
    Ye, Xin
    Jiang, Rongxin
    Tian, Xiang
    Zhang, Rui
    Chen, Yaowu
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 566 - 570