Bi-Level Orthogonal Multi-Teacher Distillation

被引：0

作者：

Gong, Shuyue ^{[1
]}

Wen, Weigang ^{[1
]}

机构：

[1] Beijing Jiaotong Univ, Sch Mech Elect & Control Engn, Beijing 100044, Peoples R China

来源：

ELECTRONICS | 2024年 / 13卷 / 16期

关键词：

knowledge distillation; deep learning; convolutional neural networks; teacher-student model; optimization; multi-model learning; soft labeling; supervised learning;

D O I：

10.3390/electronics13163345

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Multi-teacher knowledge distillation is a powerful technique that leverages diverse information sources from multiple pre-trained teachers to enhance student model performance. However, existing methods often overlook the challenge of effectively transferring knowledge to weaker student models. To address this limitation, we propose BOMD (Bi-level Optimization for Multi-teacher Distillation), a novel approach that combines bi-level optimization with multiple orthogonal projections. Our method employs orthogonal projections to align teacher feature representations with the student's feature space while preserving structural properties. This alignment is further reinforced through a dedicated feature alignment loss. Additionally, we utilize bi-level optimization to learn optimal weighting factors for combining knowledge from heterogeneous teachers, treating the weights as upper-level variables and the student's parameters as lower-level variables. Extensive experiments on multiple benchmark datasets demonstrate the effectiveness and flexibility of BOMD. Our method achieves state-of-the-art performance on the CIFAR-100 benchmark for multi-teacher knowledge distillation across diverse scenarios, consistently outperforming existing approaches. BOMD shows significant improvements for both homogeneous and heterogeneous teacher ensembles, even when distilling to compact student models.

引用

页数：15

共 50 条

[1] Adaptive multi-teacher multi-level knowledge distillation
Liu, Yuang
Zhang, Wei
Wang, Jun
[J]. NEUROCOMPUTING, 2020, 415 : 106 - 113
[2] Adaptive multi-teacher multi-level knowledge distillation
Liu, Yuang
Zhang, Wei
Wang, Jun
[J]. Neurocomputing, 2021, 415 : 106 - 113
[3] Reinforced Multi-Teacher Selection for Knowledge Distillation
Yuan, Fei
Shou, Linjun
Pei, Jian
Lin, Wutao
Gong, Ming
Fu, Yan
Jiang, Daxin
[J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 14284 - 14291
[4] Correlation Guided Multi-teacher Knowledge Distillation
Shi, Luyao
Jiang, Ning
Tang, Jialiang
Huang, Xinlei
[J]. NEURAL INFORMATION PROCESSING, ICONIP 2023, PT IV, 2024, 14450 : 562 - 574
[5] Knowledge Distillation via Multi-Teacher Feature Ensemble
Ye, Xin
Jiang, Rongxin
Tian, Xiang
Zhang, Rui
Chen, Yaowu
[J]. IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 566 - 570
[6] CONFIDENCE-AWARE MULTI-TEACHER KNOWLEDGE DISTILLATION
Zhang, Hailin
Chen, Defang
Wang, Can
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4498 - 4502
[7] Knowledge Distillation via Multi-Teacher Feature Ensemble
Ye, Xin
Jiang, Rongxin
Tian, Xiang
Zhang, Rui
Chen, Yaowu
[J]. IEEE Signal Processing Letters, 2024, 31 : 566 - 570
[8] Decoupled Multi-teacher Knowledge Distillation based on Entropy
Cheng, Xin
Tang, Jialiang
Zhang, Zhiqiang
Yu, Wenxin
Jiang, Ning
Zhou, Jinjia
[J]. 2024 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS 2024, 2024,
[9] Anomaly detection based on multi-teacher knowledge distillation
Ma, Ye
Jiang, Xu
Guan, Nan
Yi, Wang
[J]. JOURNAL OF SYSTEMS ARCHITECTURE, 2023, 138
[10] Robust Semantic Segmentation With Multi-Teacher Knowledge Distillation
Amirkhani, Abdollah
Khosravian, Amir
Masih-Tehrani, Masoud
Kashiani, Hossein
[J]. IEEE ACCESS, 2021, 9 : 119049 - 119066

← 1 2 3 4 5 →