Multi-teacher knowledge distillation based on joint Guidance of Probe and Adaptive Corrector

被引:5
|
作者
Shang, Ronghua [1 ]
Li, Wenzheng [2 ]
Zhu, Songling [1 ]
Jiao, Licheng [1 ]
Li, Yangyang [1 ]
机构
[1] Xidian Univ, Sch Artificial Intelligence, Key Lab Intelligent Percept & Image Understanding, Minist Educ, Xian, Shaanxi, Peoples R China
[2] Xidian Univ, Guangzhou Inst Technol, Key Lab Intelligent Percept & Image Understanding, Minist Educ, Guangzhou, Guangdong, Peoples R China
基金
中国国家自然科学基金;
关键词
Knowledge distillation; Linear classifier probes; Convolutional neural networks; Spail attention; Model compression; NEURAL-NETWORKS; MODEL;
D O I
10.1016/j.neunet.2023.04.015
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Knowledge distillation (KD) has been widely used in model compression. But, in the current multi -teacher KD algorithms, the student can only passively acquire the knowledge of the teacher's middle layer in a single form and all teachers use identical a guiding scheme to the student. To solve these problems, this paper proposes a multi-teacher KD based on joint Guidance of Probe and Adaptive Corrector (GPAC) method. First, GPAC proposes a teacher selection strategy guided by the Linear Classifier Probe (LCP). This strategy allows the student to select better teachers in the middle layer. Teachers are evaluated using the classification accuracy detected by LCP. Then, GPAC designs an adaptive multi-teacher instruction mechanism. The mechanism uses instructional weights to emphasize the student's predicted direction and reduce the student's difficulty learning from teachers. At the same time, every teacher can formulate guiding scheme according to the Kullback- Leibler divergence loss of the student and itself. Finally, GPAC develops a multi-level mechanism for adjusting spatial attention loss. this mechanism uses a piecewise function that varies with the number of epochs to adjust the spatial attention loss. This piecewise function classifies the student' learning about spatial attention into three levels, which can efficiently use spatial attention of teachers. GPAC and the current state-of-the-art distillation methods are tested on CIFAR-10 and CIFAR-100 datasets. The experimental results demonstrate that the proposed method in this paper can obtain higher classification accuracy. (c) 2023 Elsevier Ltd. All rights reserved.
引用
下载
收藏
页码:345 / 356
页数:12
相关论文
共 50 条
  • [31] Enhanced Accuracy and Robustness via Multi-teacher Adversarial Distillation
    Zhao, Shiji
    Yu, Jie
    Sun, Zhenlong
    Zhang, Bo
    Wei, Xingxing
    COMPUTER VISION - ECCV 2022, PT IV, 2022, 13664 : 585 - 602
  • [32] LGFA-MTKD: Enhancing Multi-Teacher Knowledge Distillation with Local and Global Frequency Attention
    Cheng, Xin
    Zhou, Jinjia
    Information (Switzerland), 2024, 15 (11)
  • [33] MT4MTL-KD: A Multi-Teacher Knowledge Distillation Framework for Triplet Recognition
    Gui, Shuangchun
    Wang, Zhenkun
    Chen, Jixiang
    Zhou, Xun
    Zhang, Chen
    Cao, Yi
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2024, 43 (04) : 1628 - 1639
  • [34] Multi-Teacher Distillation With Single Model for Neural Machine Translation
    Liang, Xiaobo
    Wu, Lijun
    Li, Juntao
    Qin, Tao
    Zhang, Min
    Liu, Tie-Yan
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 992 - 1002
  • [35] Multi-teacher Universal Distillation Based on Information Hiding for Defense Against Facial Manipulation
    Li, Xin
    Ni, Rongrong
    Zhao, Yao
    Ni, Yu
    Li, Haoliang
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, : 5293 - 5307
  • [36] Learning Lightweight Object Detectors via Multi-Teacher Progressive Distillation
    Cao, Shengcao
    Li, Mengtian
    Hays, James
    Ramanan, Deva
    Wang, Yu-Xiong
    Gui, Liang-Yan
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202
  • [37] MTUW-GAN: A Multi-Teacher Knowledge Distillation Generative Adversarial Network for Underwater Image Enhancement
    Zhang, Tianchi
    Liu, Yuxuan
    Mase, Atsushi
    APPLIED SCIENCES-BASEL, 2024, 14 (02):
  • [38] Model Compression with Two-stage Multi-teacher Knowledge Distillation for Web Question Answering System
    Yang, Ze
    Shou, Linjun
    Gong, Ming
    Lin, Wutao
    Jiang, Daxin
    PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING (WSDM '20), 2020, : 690 - 698
  • [39] Collaborative Multi-Teacher Knowledge Distillation for Learning Low Bit-width Deep Neural Networks
    Cuong Pham
    Tuan Hoang
    Thanh-Toan Do
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 6424 - 6432
  • [40] Data-Free Low-Bit Quantization via Dynamic Multi-teacher Knowledge Distillation
    Huang, Chong
    Lin, Shaohui
    Zhang, Yan
    Li, Ke
    Zhang, Baochang
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT VIII, 2024, 14432 : 28 - 41