Sliding Cross Entropy for Self-Knowledge Distillation

被引:3
|
作者
Lee, Hanbeen [1 ]
Kim, Jeongho [1 ]
Woo, Simon S. [2 ]
机构
[1] Sungkyunkwan Univ, Dept Artificial Intelligence, Suwon, South Korea
[2] Sungkyunkwan Univ, Coll Comp & Informat, Suwon, South Korea
基金
新加坡国家研究基金会;
关键词
Representation Learning; Knowledge Distillation; Computer Vision;
D O I
10.1145/3511808.3557453
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Knowledge distillation (KD) is a powerful technique for improving the performance of a small model by leveraging the knowledge of a larger model. Despite its remarkable performance boost, KD has a drawback with the substantial computational cost of pre-training larger models in advance. Recently, a method called self-knowledge distillation has emerged to improve the model's performance without any supervision. In this paper, we present a novel plug-in approach called Sliding Cross Entropy (SCE) method, which can be combined with existing self-knowledge distillation to significantly improve the performance. Specifically, to minimize the difference between the output of the model and the soft target obtained by self-distillation, we split each softmax representation by a certain window size, and reduce the distance between sliced parts. Through this approach, the model evenly considers all the inter-class relationships of a soft target during optimization. The extensive experiments show that our approach is effective in various tasks, including classification, object detection, and semantic segmentation. We also demonstrate SCE consistently outperforms existing baseline methods.
引用
收藏
页码:1044 / 1053
页数:10
相关论文
共 50 条
  • [1] Neighbor self-knowledge distillation
    Liang, Peng
    Zhang, Weiwei
    Wang, Junhuang
    Guo, Yufeng
    INFORMATION SCIENCES, 2024, 654
  • [2] Self-knowledge distillation with dimensional history knowledge
    Wenke Huang
    Mang Ye
    Zekun Shi
    He Li
    Bo Du
    Science China Information Sciences, 2025, 68 (9)
  • [3] Self-knowledge distillation via dropout
    Lee, Hyoje
    Park, Yeachan
    Seo, Hyun
    Kang, Myungjoo
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2023, 233
  • [4] Dual teachers for self-knowledge distillation
    Li, Zheng
    Li, Xiang
    Yang, Lingfeng
    Song, Renjie
    Yang, Jian
    Pan, Zhigeng
    PATTERN RECOGNITION, 2024, 151
  • [5] Self-Knowledge Distillation with Progressive Refinement of Targets
    Kim, Kyungyul
    Ji, ByeongMoon
    Yoon, Doyoung
    Hwang, Sangheum
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 6547 - 6556
  • [6] Self-knowledge distillation for surgical phase recognition
    Zhang, Jinglu
    Barbarisi, Santiago
    Kadkhodamohammadi, Abdolrahim
    Stoyanov, Danail
    Luengo, Imanol
    INTERNATIONAL JOURNAL OF COMPUTER ASSISTED RADIOLOGY AND SURGERY, 2024, 19 (01) : 61 - 68
  • [7] Diversified branch fusion for self-knowledge distillation
    Long, Zuxiang
    Ma, Fuyan
    Sun, Bin
    Tan, Mingkui
    Li, Shutao
    INFORMATION FUSION, 2023, 90 : 12 - 22
  • [8] Noisy Self-Knowledge Distillation for Text Summarization
    Liu, Yang
    Shen, Sheng
    Lapata, Mirella
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 692 - 703
  • [9] Self-knowledge distillation for surgical phase recognition
    Jinglu Zhang
    Santiago Barbarisi
    Abdolrahim Kadkhodamohammadi
    Danail Stoyanov
    Imanol Luengo
    International Journal of Computer Assisted Radiology and Surgery, 2024, 19 : 61 - 68
  • [10] Patch Similarity Self-Knowledge Distillation for Cross-View Geo-Localization
    Li, Songlian
    Hu, Min
    Xiao, Xiongwu
    Tu, Zhigang
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (06) : 5091 - 5103