A multi-level collaborative self-distillation learning for improving adaptive inference efficiency

被引:0
|
作者
Zhang, Likun [1 ,3 ]
Li, Jinbao [2 ]
Zhang, Benqian [4 ]
Guo, Yahong [5 ]
机构
[1] Heilongjiang Univ, Sch Elect Engn, Harbin 150080, Peoples R China
[2] Qilu Univ Technol, Shandong Artificial Intelligence Inst, Shandong Acad Sci, Sch Math & Stat, Jinan 250014, Peoples R China
[3] East Univ Heilongjiang, Sch Informat Engn, Harbin 150066, Peoples R China
[4] Qilu Univ Technol, Shandong Acad Sci, Sch Math & Stat, Jinan 250353, Peoples R China
[5] Qilu Univ Technol, Shandong Acad Sci, Sch Comp Sci & Technol, Jinan 250353, Peoples R China
基金
中国国家自然科学基金;
关键词
Multi-level collaborative self-distillation; Multi-exit network; Collaborative learning; Adaptive inference; Efficient computing;
D O I
10.1007/s40747-024-01572-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A multi-exit network is an important technique for achieving adaptive inference by dynamically allocating computational resources based on different input samples. The existing works mainly treat the final classifier as the teacher, enhancing the classification accuracy by transferring knowledge to the intermediate classifiers. However, this traditional self-distillation training strategy only utilizes the knowledge contained in the final classifier, neglecting potentially distinctive knowledge in the other classifiers. To address this limitation, we propose a novel multi-level collaborative self-distillation learning strategy (MLCSD) that extracts knowledge from all the classifiers. MLCSD dynamically determines the weight coefficients for each classifier's contribution through a learning process, thus constructing more comprehensive and effective teachers tailored to each classifier. These new teachers transfer the knowledge back to each classifier through a distillation technique, thereby further improving the network's inference efficiency. We conduct experiments on three datasets, CIFAR10, CIFAR100, and Tiny-ImageNet. Compared with the baseline network that employs traditional self-distillation, our MLCSD-Net based on ResNet18 enhances the average classification accuracy by 1.18%. The experimental results demonstrate that MLCSD-Net improves the inference efficiency of adaptive inference applications, such as anytime prediction and budgeted batch classification. Code is available at https://github.com/deepzlk/MLCSD-Net.
引用
收藏
页数:19
相关论文
共 50 条
  • [1] Multi-level semantic enhancement based on self-distillation BERT for Chinese named
    Li, Zepeng
    Cao, Shuo
    Zhai, Minyu
    Ding, Nengneng
    Zhang, Zhenwen
    Hu, Bin
    [J]. NEUROCOMPUTING, 2024, 586
  • [2] Self-distillation improves self-supervised learning for DNA sequence inference
    Yu, Tong
    Cheng, Lei
    Khalitov, Ruslan
    Olsson, Erland B.
    Yang, Zhirong
    [J]. Neural Networks, 2025, 183
  • [3] Multi-Stage Training with Multi-Level Knowledge Self-Distillation for Fine-Grained Image Recognition
    Yu, Ying
    Wei, Wei
    Tang, Hong
    Qian, Jin
    [J]. Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2023, 60 (08): : 1834 - 1845
  • [4] Adaptive Similarity Bootstrapping for Self-Distillation based Representation Learning
    Lebailly, Tim
    Stegmueller, Thomas
    Bozorgtabar, Behzad
    Thiran, Jean-Philippe
    Tuytelaars, Tinne
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 16459 - 16468
  • [5] Adaptive Ensemble Self-Distillation With Consistent Gradients for Fast Inference of Pretrained Language Models
    Kong, Jun
    Wang, Jin
    Zhang, Xuejie
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 430 - 442
  • [6] Adaptive multi-teacher multi-level knowledge distillation
    Liu, Yuang
    Zhang, Wei
    Wang, Jun
    [J]. NEUROCOMPUTING, 2020, 415 : 106 - 113
  • [7] Adaptive multi-teacher multi-level knowledge distillation
    Liu, Yuang
    Zhang, Wei
    Wang, Jun
    [J]. Neurocomputing, 2021, 415 : 106 - 113
  • [8] Multi-Level Collaborative Learning for Multi-Target Domain Adaptive Semantic Segmentation
    Ding, Feifei
    Li, Jianjun
    [J]. IEEE Transactions on Circuits and Systems for Video Technology, 2024, 34 (12) : 12730 - 12740
  • [9] Progressive multi-level distillation learning for pruning network
    Wang, Ruiqing
    Wan, Shengmin
    Zhang, Wu
    Zhang, Chenlu
    Li, Yu
    Xu, Shaoxiang
    Zhang, Lifu
    Jin, Xiu
    Jiang, Zhaohui
    Rao, Yuan
    [J]. COMPLEX & INTELLIGENT SYSTEMS, 2023, 9 (05) : 5779 - 5791
  • [10] Progressive multi-level distillation learning for pruning network
    Ruiqing Wang
    Shengmin Wan
    Wu Zhang
    Chenlu Zhang
    Yu Li
    Shaoxiang Xu
    Lifu Zhang
    Xiu Jin
    Zhaohui Jiang
    Yuan Rao
    [J]. Complex & Intelligent Systems, 2023, 9 : 5779 - 5791