Improving knowledge distillation via pseudo-multi-teacher network

被引：0

作者：

Li, Shunhang ^{[1
]}

Shao, Mingwen ^{[1
]}

Guo, Zihao ^{[1
]}

Zhuang, Xinkai ^{[1
]}

机构：

[1] China Univ Petr, Coll Comp Sci & Technol, Changjiang Rd, Qingdao 266580, Shandong, Peoples R China

来源：

MACHINE VISION AND APPLICATIONS | 2023年 / 34卷 / 02期

基金：

中国国家自然科学基金;

关键词：

Convolutional neural networks; Knowledge distillation; Online distillation; Mutual learning;

D O I：

10.1007/s00138-023-01383-5

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Existing knowledge distillation methods usually directly push the student model to imitate the features or probabilities of the teacher model. However, the knowledge capacity of teachers limits students to learn undiscovered knowledge. To address this issue, we propose a pseudo-multi-teacher knowledge distillation method to augment the learning of undiscovered knowledge. Specifically, we propose a well-designed auxiliary classifier to capture semantic information in cross-layer that enables our network to obtain more abundant supervised information. Besides, we propose an ensemble module to combine the feature maps of each sub-network, which generates a more significant ensemble of features to guide the network. Furthermore, the auxiliary classifier and ensemble module are discarded after training, and thus there are no additional parameters introduced to the final model. Comprehensive experiments on benchmark datasets demonstrate the effectiveness of our proposed method.

引用

页数：11

共 50 条

[21] Knowledge Fusion Distillation: Improving Distillation with Multi-scale Attention Mechanisms
Li, Linfeng
Su, Weixing
Liu, Fang
He, Maowei
Liang, Xiaodan
[J]. NEURAL PROCESSING LETTERS, 2023, 55 (05) : 6165 - 6180
[22] Knowledge Fusion Distillation: Improving Distillation with Multi-scale Attention Mechanisms
Linfeng Li
Weixing Su
Fang Liu
Maowei He
Xiaodan Liang
[J]. Neural Processing Letters, 2023, 55 : 6165 - 6180
[23] Dual knowledge distillation for visual tracking with teacher-student network
Wang, Yuanyun
Sun, Chuanyu
Wang, Jun
Chai, Bingfei
[J]. SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (6-7) : 5203 - 5211
[24] Adaptive Multi-Teacher Knowledge Distillation with Meta-Learning
Zhang, Hailin
Chen, Defang
Wang, Can
[J]. 2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 1943 - 1948
[25] Improving neural ordinary differential equations via knowledge distillation
Chu, Haoyu
Wei, Shikui
Lu, Qiming
Zhao, Yao
[J]. IET COMPUTER VISION, 2024, 18 (02) : 304 - 314
[26] Student Network Learning via Evolutionary Knowledge Distillation
Zhang, Kangkai
Zhang, Chunhui
Li, Shikun
Zeng, Dan
Ge, Shiming
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (04) : 2251 - 2263
[27] ATMKD: adaptive temperature guided multi-teacher knowledge distillation
Lin, Yu-e
Yin, Shuting
Ding, Yifeng
Liang, Xingzhu
[J]. MULTIMEDIA SYSTEMS, 2024, 30 (05)
[28] MTUW-GAN: A Multi-Teacher Knowledge Distillation Generative Adversarial Network for Underwater Image Enhancement
Zhang, Tianchi
Liu, Yuxuan
Mase, Atsushi
[J]. APPLIED SCIENCES-BASEL, 2024, 14 (02):
[29] Data-Free Low-Bit Quantization via Dynamic Multi-teacher Knowledge Distillation
Huang, Chong
Lin, Shaohui
Zhang, Yan
Li, Ke
Zhang, Baochang
[J]. PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT VIII, 2024, 14432 : 28 - 41
[30] MulDE: Multi-teacher Knowledge Distillation for Low-dimensional Knowledge Graph Embeddings
Wang, Kai
Liu, Yu
Ma, Qian
Sheng, Quan Z.
[J]. PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE 2021 (WWW 2021), 2021, : 1716 - 1726

← 1 2 3 4 5 →