Sliding Cross Entropy for Self-Knowledge Distillation

被引：3

作者：

Lee, Hanbeen ^{[1
]}

Kim, Jeongho ^{[1
]}

Woo, Simon S. ^{[2
]}

机构：

[1] Sungkyunkwan Univ, Dept Artificial Intelligence, Suwon, South Korea

[2] Sungkyunkwan Univ, Coll Comp & Informat, Suwon, South Korea

来源：

PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022 | 2022年

基金：

新加坡国家研究基金会;

关键词：

Representation Learning; Knowledge Distillation; Computer Vision;

D O I：

10.1145/3511808.3557453

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Knowledge distillation (KD) is a powerful technique for improving the performance of a small model by leveraging the knowledge of a larger model. Despite its remarkable performance boost, KD has a drawback with the substantial computational cost of pre-training larger models in advance. Recently, a method called self-knowledge distillation has emerged to improve the model's performance without any supervision. In this paper, we present a novel plug-in approach called Sliding Cross Entropy (SCE) method, which can be combined with existing self-knowledge distillation to significantly improve the performance. Specifically, to minimize the difference between the output of the model and the soft target obtained by self-distillation, we split each softmax representation by a certain window size, and reduce the distance between sliced parts. Through this approach, the model evenly considers all the inter-class relationships of a soft target during optimization. The extensive experiments show that our approach is effective in various tasks, including classification, object detection, and semantic segmentation. We also demonstrate SCE consistently outperforms existing baseline methods.

引用

页码：1044 / 1053

页数：10

共 50 条

[1] Neighbor self-knowledge distillation
Liang, Peng
Zhang, Weiwei
Wang, Junhuang
Guo, Yufeng
INFORMATION SCIENCES, 2024, 654
[2] Self-knowledge distillation with dimensional history knowledge
Wenke Huang
Mang Ye
Zekun Shi
He Li
Bo Du
Science China Information Sciences, 2025, 68 (9)
[3] Self-knowledge distillation via dropout
Lee, Hyoje
Park, Yeachan
Seo, Hyun
Kang, Myungjoo
COMPUTER VISION AND IMAGE UNDERSTANDING, 2023, 233
[4] Dual teachers for self-knowledge distillation
Li, Zheng
Li, Xiang
Yang, Lingfeng
Song, Renjie
Yang, Jian
Pan, Zhigeng
PATTERN RECOGNITION, 2024, 151
[5] Self-Knowledge Distillation with Progressive Refinement of Targets
Kim, Kyungyul
Ji, ByeongMoon
Yoon, Doyoung
Hwang, Sangheum
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 6547 - 6556
[6] Self-knowledge distillation for surgical phase recognition
Zhang, Jinglu
Barbarisi, Santiago
Kadkhodamohammadi, Abdolrahim
Stoyanov, Danail
Luengo, Imanol
INTERNATIONAL JOURNAL OF COMPUTER ASSISTED RADIOLOGY AND SURGERY, 2024, 19 (01) : 61 - 68
[7] Diversified branch fusion for self-knowledge distillation
Long, Zuxiang
Ma, Fuyan
Sun, Bin
Tan, Mingkui
Li, Shutao
INFORMATION FUSION, 2023, 90 : 12 - 22
[8] Noisy Self-Knowledge Distillation for Text Summarization
Liu, Yang
Shen, Sheng
Lapata, Mirella
2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 692 - 703
[9] Self-knowledge distillation for surgical phase recognition
Jinglu Zhang
Santiago Barbarisi
Abdolrahim Kadkhodamohammadi
Danail Stoyanov
Imanol Luengo
International Journal of Computer Assisted Radiology and Surgery, 2024, 19 : 61 - 68
[10] Patch Similarity Self-Knowledge Distillation for Cross-View Geo-Localization
Li, Songlian
Hu, Min
Xiao, Xiongwu
Tu, Zhigang
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (06) : 5091 - 5103

← 1 2 3 4 5 →