Semantic Stage-Wise Learning for Knowledge Distillation

被引:0
|
作者
Liu, Dongqin [1 ,2 ]
Li, Wentao [1 ,2 ]
Zhou, Wei [1 ]
Li, Zhaoxing [1 ,2 ]
Dai, Jiao [1 ]
Han, Jizhong [1 ]
Li, Ruixuan [3 ]
Hu, Songlin [1 ]
机构
[1] Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Sch Cyber Secur, Beijing, Peoples R China
[3] Huazhong Univ Sci & Technol, Wuhan, Peoples R China
关键词
knowledge distillation; image classification; object detection; instance segmentation;
D O I
10.1109/ICME55011.2023.00145
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Knowledge distillation enhances the performance of the student model by transferring knowledge from the teacher model. Moreover, the attention mechanism has been introduced recently to enable each layer of the student to learn knowledge from all teacher layers, which brings about considerable optimization. However, noted that features from different layers, such as shallow and deep layers, might have a big semantic gap, and compulsively aligning one student layer to all teacher layers would mislead the learning process. To tackle this problem, an effective framework called Semantic Stage-Wise learning for Knowledge Distillation (SSWKD) is presented in this paper. We divide all layers into shallow and deep stages, and only allow feature alignment within the same stage to alleviate semantic mismatch. In addition, with the observation that the performance of deep networks relies more on some key features rather than evenly on all of them, a crucial feature enhancement method based on KL divergence is then proposed for SSWKD, forcing the student to pay more attention to critical features of the teacher. Extensive experiments and visualizations show that our SSWKD outperforms other distillation methods on CIFAR-100 and COCO2017 datasets for image classification, object detection, and instance segmentation tasks.
引用
收藏
页码:816 / 821
页数:6
相关论文
共 50 条
  • [1] Stage-Wise Learning of Reaching Using Little Prior Knowledge
    de la Bourdonnaye, Francois
    Teuliere, Celine
    Triesch, Jochen
    Chateau, Thierry
    [J]. FRONTIERS IN ROBOTICS AND AI, 2018, 5
  • [2] Stage-wise washing of pulp
    Potucek, Frantisek
    Milichovsky, Miloslav
    [J]. Papir A Celuloza, 1999, 54 (09): : 229 - 234
  • [3] Learning to touch objects through stage-wise deep reinforcement learning
    de La Bourdonnaye, Francois
    Teuliere, Celine
    Triesch, Jochen
    Chateau, Thierry
    [J]. 2018 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2018, : 7789 - 7794
  • [4] Progressive Stage-wise Learning for Unsupervised Feature Representation Enhancement
    Li, Zefan
    Liu, Chenxi
    Yuille, Alan
    Ni, Bingbing
    Zhang, Wenjun
    Gao, Wen
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 9762 - 9771
  • [6] Stage-Wise Neural Architecture Search
    Jordao, Artur
    Akio, Fernando
    Lie, Maiko
    Schwartz, William Robson
    [J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 1985 - 1992
  • [7] STAGE-WISE TREATMENT OF ABDOMINAL INJURIES
    KOMAROV, BD
    POKROVSK.GA
    [J]. SOVETSKAYA MEDITSINA, 1974, (09): : 54 - 57
  • [8] Stage-wise Conservative Linear Bandits
    Moradipari, Ahmadreza
    Thrampoulidis, Christos
    Alizadeh, Mahnoosh
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [9] STAGE-WISE STUDY OF DIMEDROL SYNTHESIS
    VITVITSKAYA, AS
    KHOLODOV, LE
    NAIDIS, FB
    KATSNELSON, EZ
    CHEREDOV, VA
    KARPINSKAYA, IA
    ROMANOVICH, AA
    [J]. KHIMIKO-FARMATSEVTICHESKII ZHURNAL, 1975, 9 (07): : 43 - 47
  • [10] ATTENTION-GUIDED DERAINING NETWORK VIA STAGE-WISE LEARNING
    Jiang, Kui
    Wang, Zhongyuan
    Yi, Peng
    Chen, Chen
    Yang, Yuhong
    Tian, Xin
    Jiang, Junjun
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 2618 - 2622