Improving knowledge distillation via pseudo-multi-teacher network

被引:0
|
作者
Li, Shunhang [1 ]
Shao, Mingwen [1 ]
Guo, Zihao [1 ]
Zhuang, Xinkai [1 ]
机构
[1] China Univ Petr, Coll Comp Sci & Technol, Changjiang Rd, Qingdao 266580, Shandong, Peoples R China
基金
中国国家自然科学基金;
关键词
Convolutional neural networks; Knowledge distillation; Online distillation; Mutual learning;
D O I
10.1007/s00138-023-01383-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Existing knowledge distillation methods usually directly push the student model to imitate the features or probabilities of the teacher model. However, the knowledge capacity of teachers limits students to learn undiscovered knowledge. To address this issue, we propose a pseudo-multi-teacher knowledge distillation method to augment the learning of undiscovered knowledge. Specifically, we propose a well-designed auxiliary classifier to capture semantic information in cross-layer that enables our network to obtain more abundant supervised information. Besides, we propose an ensemble module to combine the feature maps of each sub-network, which generates a more significant ensemble of features to guide the network. Furthermore, the auxiliary classifier and ensemble module are discarded after training, and thus there are no additional parameters introduced to the final model. Comprehensive experiments on benchmark datasets demonstrate the effectiveness of our proposed method.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] Knowledge Fusion Distillation: Improving Distillation with Multi-scale Attention Mechanisms
    Li, Linfeng
    Su, Weixing
    Liu, Fang
    He, Maowei
    Liang, Xiaodan
    [J]. NEURAL PROCESSING LETTERS, 2023, 55 (05) : 6165 - 6180
  • [22] Knowledge Fusion Distillation: Improving Distillation with Multi-scale Attention Mechanisms
    Linfeng Li
    Weixing Su
    Fang Liu
    Maowei He
    Xiaodan Liang
    [J]. Neural Processing Letters, 2023, 55 : 6165 - 6180
  • [23] Dual knowledge distillation for visual tracking with teacher-student network
    Wang, Yuanyun
    Sun, Chuanyu
    Wang, Jun
    Chai, Bingfei
    [J]. SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (6-7) : 5203 - 5211
  • [24] Adaptive Multi-Teacher Knowledge Distillation with Meta-Learning
    Zhang, Hailin
    Chen, Defang
    Wang, Can
    [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 1943 - 1948
  • [25] Improving neural ordinary differential equations via knowledge distillation
    Chu, Haoyu
    Wei, Shikui
    Lu, Qiming
    Zhao, Yao
    [J]. IET COMPUTER VISION, 2024, 18 (02) : 304 - 314
  • [26] Student Network Learning via Evolutionary Knowledge Distillation
    Zhang, Kangkai
    Zhang, Chunhui
    Li, Shikun
    Zeng, Dan
    Ge, Shiming
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (04) : 2251 - 2263
  • [27] ATMKD: adaptive temperature guided multi-teacher knowledge distillation
    Lin, Yu-e
    Yin, Shuting
    Ding, Yifeng
    Liang, Xingzhu
    [J]. MULTIMEDIA SYSTEMS, 2024, 30 (05)
  • [28] MTUW-GAN: A Multi-Teacher Knowledge Distillation Generative Adversarial Network for Underwater Image Enhancement
    Zhang, Tianchi
    Liu, Yuxuan
    Mase, Atsushi
    [J]. APPLIED SCIENCES-BASEL, 2024, 14 (02):
  • [29] Data-Free Low-Bit Quantization via Dynamic Multi-teacher Knowledge Distillation
    Huang, Chong
    Lin, Shaohui
    Zhang, Yan
    Li, Ke
    Zhang, Baochang
    [J]. PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT VIII, 2024, 14432 : 28 - 41
  • [30] MulDE: Multi-teacher Knowledge Distillation for Low-dimensional Knowledge Graph Embeddings
    Wang, Kai
    Liu, Yu
    Ma, Qian
    Sheng, Quan Z.
    [J]. PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE 2021 (WWW 2021), 2021, : 1716 - 1726