Leveraging angular distributions for improved knowledge distillation

被引：5

作者：

Jeon, Eun Som ^{[1
]}

Choi, Hongjun ^{[1
]}

Shukla, Ankita ^{[1
]}

Turaga, Pavan ^{[1
]}

机构：

[1] Arizona State Univ, Sch Arts, Media & Engn & Sch Elect Comp & Energy Engn, Geometr Media Lab, Tempe, AZ 85281 USA

来源：

NEUROCOMPUTING | 2023年 / 518卷

关键词：

Knowledge distillation; Angular distribution; Angular margin; Image classification;

D O I：

10.1016/j.neucom.2022.11.029

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Knowledge distillation as a broad class of methods has led to the development of lightweight and mem-ory efficient models, using a pre-trained model with a large capacity (teacher network) to train a smaller model (student network). Recently, additional variations for knowledge distillation, utilizing activation maps of intermediate layers as the source of knowledge, have been studied. Generally, in computer vision applications, it is seen that the feature activation learned by a higher-capacity model contains richer knowledge, highlighting complete objects while focusing less on the background. Based on this observa-tion, we leverage the teacher's dual ability to accurately distinguish between positive (relevant to the tar-get object) and negative (irrelevant) areas.We propose a new loss function for distillation, called angular margin-based distillation (AMD) loss. AMD loss uses the angular distance between positive and negative features by projecting them onto a hypersphere, motivated by the near angular distributions seen in many feature extractors. Then, we cre-ate a more attentive feature that is angularly distributed on the hypersphere by introducing an angular margin to the positive feature. Transferring such knowledge from the teacher network enables the stu-dent model to harness the teacher's higher discrimination of positive and negative features, thus distilling superior student models. The proposed method is evaluated for various student-teacher network pairs on four public datasets. Furthermore, we show that the proposed method has advantages in compatibility with other learning techniques, such as using fine-grained features, augmentation, and other distillation methods.(c) 2022 Elsevier B.V. All rights reserved.

引用

页码：466 / 481

页数：16

共 50 条

[21] A novel staged training strategy leveraging knowledge distillation and model fusion for heterogeneous federated learning
Wang, Debao
Guan, Shaopeng
Sun, Ruikang
JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2025, 236
[22] KDNet: Leveraging Vision-Language Knowledge Distillation for Few-Shot Object Detection
Ma, Mengyuan
Qian, Lin
Yin, Hujun
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT II, 2024, 15017 : 153 - 167
[23] Leveraging human expert image annotations to improve pneumonia differentiation through human knowledge distillation
Daniel Schaudt
Reinhold von Schwerin
Alexander Hafner
Pascal Riedel
Christian Späte
Manfred Reichert
Andreas Hinteregger
Meinrad Beer
Christopher Kloth
Scientific Reports, 13
[24] Incremental event detection via an improved knowledge distillation based model
Lin, Yi
Xu, Changhua
Yu, Hang
Tian, Pinzhuo
Luo, Xiangfeng
NEUROCOMPUTING, 2023, 551
[25] Identification of internal voids in pavement based on improved knowledge distillation technology
Kan, Qian
Liu, Xing
Meng, Anxin
Yu, Li
CASE STUDIES IN CONSTRUCTION MATERIALS, 2024, 21
[26] Echo State Network Based on Improved Knowledge Distillation for Edge Intelligence
Jian ZHOU
Yuwen JIANG
Lijie XU
Lu ZHAO
Fu XIAO
Chinese Journal of Electronics, 2024, 33 (01) : 101 - 111
[27] Echo State Network Based on Improved Knowledge Distillation for Edge Intelligence
Zhou, Jian
Jiang, Yuwen
Xu, Lijie
Zhao, Lu
Xiao, Fu
CHINESE JOURNAL OF ELECTRONICS, 2024, 33 (01) : 101 - 111
[28] Quintet Margin Loss for an Improved Knowledge Distillation in Histopathology Image Analysis
Vuong, Trinh T. L.
Kwak, Jin Tae
MEDICAL IMAGING 2023, 2023, 12471
[29] Leveraging Non-Causal Knowledge via Cross-Network Knowledge Distillation for Real-Time Speech Enhancement
Park, Hyun Joon
Shin, Wooseok
Kim, Jin Sob
Han, Sung Won
IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 1129 - 1133
[30] SAR Image Classification with Knowledge Distillation and Class Balancing for Long-Tailed Distributions
Jahan, Chowdhury Sadman
Savakis, Andreas
Blasch, Erik
2022 IEEE 14TH IMAGE, VIDEO, AND MULTIDIMENSIONAL SIGNAL PROCESSING WORKSHOP (IVMSP), 2022,

← 1 2 3 4 5 →