Bi-Level Orthogonal Multi-Teacher Distillation

被引:0
|
作者
Gong, Shuyue [1 ]
Wen, Weigang [1 ]
机构
[1] Beijing Jiaotong Univ, Sch Mech Elect & Control Engn, Beijing 100044, Peoples R China
关键词
knowledge distillation; deep learning; convolutional neural networks; teacher-student model; optimization; multi-model learning; soft labeling; supervised learning;
D O I
10.3390/electronics13163345
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Multi-teacher knowledge distillation is a powerful technique that leverages diverse information sources from multiple pre-trained teachers to enhance student model performance. However, existing methods often overlook the challenge of effectively transferring knowledge to weaker student models. To address this limitation, we propose BOMD (Bi-level Optimization for Multi-teacher Distillation), a novel approach that combines bi-level optimization with multiple orthogonal projections. Our method employs orthogonal projections to align teacher feature representations with the student's feature space while preserving structural properties. This alignment is further reinforced through a dedicated feature alignment loss. Additionally, we utilize bi-level optimization to learn optimal weighting factors for combining knowledge from heterogeneous teachers, treating the weights as upper-level variables and the student's parameters as lower-level variables. Extensive experiments on multiple benchmark datasets demonstrate the effectiveness and flexibility of BOMD. Our method achieves state-of-the-art performance on the CIFAR-100 benchmark for multi-teacher knowledge distillation across diverse scenarios, consistently outperforming existing approaches. BOMD shows significant improvements for both homogeneous and heterogeneous teacher ensembles, even when distilling to compact student models.
引用
收藏
页数:15
相关论文
共 50 条
  • [31] SA-MDRAD: sample-adaptive multi-teacher dynamic rectification adversarial distillation
    Li, Shuyi
    Yang, Xiaohan
    Cheng, Guozhen
    Liu, Wenyan
    Hu, Hongchao
    [J]. MULTIMEDIA SYSTEMS, 2024, 30 (04)
  • [32] MT4MTL-KD: A Multi-Teacher Knowledge Distillation Framework for Triplet Recognition
    Gui, Shuangchun
    Wang, Zhenkun
    Chen, Jixiang
    Zhou, Xun
    Zhang, Chen
    Cao, Yi
    [J]. IEEE TRANSACTIONS ON MEDICAL IMAGING, 2024, 43 (04) : 1628 - 1639
  • [33] MTMS: Multi-teacher Multi-stage Knowledge Distillation for Reasoning-Based Machine Reading Comprehension
    Zhao, Zhuo
    Xie, Zhiwen
    Zhou, Guangyou
    Huang, Jimmy Xiangji
    [J]. PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 1995 - 2005
  • [34] Adaptive multi-teacher softened relational knowledge distillation framework for payload mismatch in image steganalysis
    Yu, Lifang
    Li, Yunwei
    Weng, Shaowei
    Tian, Huawei
    Liu, Jing
    [J]. JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2023, 95
  • [35] Cross-View Gait Recognition Method Based on Multi-Teacher Joint Knowledge Distillation
    Li, Ruoyu
    Yun, Lijun
    Zhang, Mingxuan
    Yang, Yanchen
    Cheng, Feiyan
    [J]. SENSORS, 2023, 23 (22)
  • [36] Bi-level evolutionary graphs with multi-fitness
    Zhang, P. -A.
    Nie, P. -Y.
    Hu, D. -Q.
    [J]. IET SYSTEMS BIOLOGY, 2010, 4 (01) : 33 - 38
  • [37] Collaborative Multi-Teacher Knowledge Distillation for Learning Low Bit-width Deep Neural Networks
    Cuong Pham
    Tuan Hoang
    Thanh-Toan Do
    [J]. 2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 6424 - 6432
  • [38] MTUW-GAN: A Multi-Teacher Knowledge Distillation Generative Adversarial Network for Underwater Image Enhancement
    Zhang, Tianchi
    Liu, Yuxuan
    Mase, Atsushi
    [J]. APPLIED SCIENCES-BASEL, 2024, 14 (02):
  • [39] A Multi-Teacher Policy Distillation Framework for Enhancing Zero-Shot Generalization of Autonomous Driving Policies
    Yang, Jiachen
    Zhang, Jipeng
    [J]. IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2024, 73 (07) : 9734 - 9746
  • [40] Model Compression with Two-stage Multi-teacher Knowledge Distillation for Web Question Answering System
    Yang, Ze
    Shou, Linjun
    Gong, Ming
    Lin, Wutao
    Jiang, Daxin
    [J]. PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING (WSDM '20), 2020, : 690 - 698