Bi-Level Orthogonal Multi-Teacher Distillation

被引:0
|
作者
Gong, Shuyue [1 ]
Wen, Weigang [1 ]
机构
[1] Beijing Jiaotong Univ, Sch Mech Elect & Control Engn, Beijing 100044, Peoples R China
关键词
knowledge distillation; deep learning; convolutional neural networks; teacher-student model; optimization; multi-model learning; soft labeling; supervised learning;
D O I
10.3390/electronics13163345
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Multi-teacher knowledge distillation is a powerful technique that leverages diverse information sources from multiple pre-trained teachers to enhance student model performance. However, existing methods often overlook the challenge of effectively transferring knowledge to weaker student models. To address this limitation, we propose BOMD (Bi-level Optimization for Multi-teacher Distillation), a novel approach that combines bi-level optimization with multiple orthogonal projections. Our method employs orthogonal projections to align teacher feature representations with the student's feature space while preserving structural properties. This alignment is further reinforced through a dedicated feature alignment loss. Additionally, we utilize bi-level optimization to learn optimal weighting factors for combining knowledge from heterogeneous teachers, treating the weights as upper-level variables and the student's parameters as lower-level variables. Extensive experiments on multiple benchmark datasets demonstrate the effectiveness and flexibility of BOMD. Our method achieves state-of-the-art performance on the CIFAR-100 benchmark for multi-teacher knowledge distillation across diverse scenarios, consistently outperforming existing approaches. BOMD shows significant improvements for both homogeneous and heterogeneous teacher ensembles, even when distilling to compact student models.
引用
收藏
页数:15
相关论文
共 50 条
  • [21] DE-MKD: Decoupled Multi-Teacher Knowledge Distillation Based on Entropy
    Cheng, Xin
    Zhang, Zhiqiang
    Weng, Wei
    Yu, Wenxin
    Zhou, Jinjia
    [J]. MATHEMATICS, 2024, 12 (11)
  • [22] Device adaptation free-KDA based on multi-teacher knowledge distillation
    Yafang Yang
    Bin Guo
    Yunji Liang
    Kaixing Zhao
    Zhiwen Yu
    [J]. Journal of Ambient Intelligence and Humanized Computing, 2024, 15 (10) : 3603 - 3615
  • [23] Multi-teacher knowledge distillation based on joint Guidance of Probe and Adaptive Corrector
    Shang, Ronghua
    Li, Wenzheng
    Zhu, Songling
    Jiao, Licheng
    Li, Yangyang
    [J]. NEURAL NETWORKS, 2023, 164 : 345 - 356
  • [24] A multi-graph neural group recommendation model with meta-learning and multi-teacher distillation
    Zhou, Weizhen
    Huang, Zhenhua
    Wang, Cheng
    Chen, Yunwen
    [J]. KNOWLEDGE-BASED SYSTEMS, 2023, 276
  • [25] mKDNAD: A network flow anomaly detection method based on multi-teacher knowledge distillation
    Yang, Yang
    Liu, Dan
    [J]. 2022 16TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP2022), VOL 1, 2022, : 314 - 319
  • [26] MulDE: Multi-teacher Knowledge Distillation for Low-dimensional Knowledge Graph Embeddings
    Wang, Kai
    Liu, Yu
    Ma, Qian
    Sheng, Quan Z.
    [J]. PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE 2021 (WWW 2021), 2021, : 1716 - 1726
  • [27] Multi-teacher knowledge distillation for compressed video action recognition based on deep learning
    Wu, Meng-Chieh
    Chiu, Ching-Te
    [J]. JOURNAL OF SYSTEMS ARCHITECTURE, 2020, 103
  • [28] Multi-teacher Universal Distillation Based on Information Hiding for Defense Against Facial Manipulation
    Li, Xin
    Ni, Rongrong
    Zhao, Yao
    Ni, Yu
    Li, Haoliang
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024,
  • [29] MULTI-TEACHER KNOWLEDGE DISTILLATION FOR COMPRESSED VIDEO ACTION RECOGNITION ON DEEP NEURAL NETWORKS
    Wu, Meng-Chieh
    Chiu, Ching-Te
    Wu, Kun-Hsuan
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 2202 - 2206
  • [30] A Multi-teacher Knowledge Distillation Framework for Distantly Supervised Relation Extraction with Flexible Temperature
    Fei, Hongxiao
    Tan, Yangying
    Huang, Wenti
    Long, Jun
    Huang, Jincai
    Yang, Liu
    [J]. WEB AND BIG DATA, PT II, APWEB-WAIM 2023, 2024, 14332 : 103 - 116