Leveraging angular distributions for improved knowledge distillation

被引:5
|
作者
Jeon, Eun Som [1 ]
Choi, Hongjun [1 ]
Shukla, Ankita [1 ]
Turaga, Pavan [1 ]
机构
[1] Arizona State Univ, Sch Arts, Media & Engn & Sch Elect Comp & Energy Engn, Geometr Media Lab, Tempe, AZ 85281 USA
关键词
Knowledge distillation; Angular distribution; Angular margin; Image classification;
D O I
10.1016/j.neucom.2022.11.029
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Knowledge distillation as a broad class of methods has led to the development of lightweight and mem-ory efficient models, using a pre-trained model with a large capacity (teacher network) to train a smaller model (student network). Recently, additional variations for knowledge distillation, utilizing activation maps of intermediate layers as the source of knowledge, have been studied. Generally, in computer vision applications, it is seen that the feature activation learned by a higher-capacity model contains richer knowledge, highlighting complete objects while focusing less on the background. Based on this observa-tion, we leverage the teacher's dual ability to accurately distinguish between positive (relevant to the tar-get object) and negative (irrelevant) areas.We propose a new loss function for distillation, called angular margin-based distillation (AMD) loss. AMD loss uses the angular distance between positive and negative features by projecting them onto a hypersphere, motivated by the near angular distributions seen in many feature extractors. Then, we cre-ate a more attentive feature that is angularly distributed on the hypersphere by introducing an angular margin to the positive feature. Transferring such knowledge from the teacher network enables the stu-dent model to harness the teacher's higher discrimination of positive and negative features, thus distilling superior student models. The proposed method is evaluated for various student-teacher network pairs on four public datasets. Furthermore, we show that the proposed method has advantages in compatibility with other learning techniques, such as using fine-grained features, augmentation, and other distillation methods.(c) 2022 Elsevier B.V. All rights reserved.
引用
收藏
页码:466 / 481
页数:16
相关论文
共 50 条
  • [21] A novel staged training strategy leveraging knowledge distillation and model fusion for heterogeneous federated learning
    Wang, Debao
    Guan, Shaopeng
    Sun, Ruikang
    JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2025, 236
  • [22] KDNet: Leveraging Vision-Language Knowledge Distillation for Few-Shot Object Detection
    Ma, Mengyuan
    Qian, Lin
    Yin, Hujun
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT II, 2024, 15017 : 153 - 167
  • [23] Leveraging human expert image annotations to improve pneumonia differentiation through human knowledge distillation
    Daniel Schaudt
    Reinhold von Schwerin
    Alexander Hafner
    Pascal Riedel
    Christian Späte
    Manfred Reichert
    Andreas Hinteregger
    Meinrad Beer
    Christopher Kloth
    Scientific Reports, 13
  • [24] Incremental event detection via an improved knowledge distillation based model
    Lin, Yi
    Xu, Changhua
    Yu, Hang
    Tian, Pinzhuo
    Luo, Xiangfeng
    NEUROCOMPUTING, 2023, 551
  • [25] Identification of internal voids in pavement based on improved knowledge distillation technology
    Kan, Qian
    Liu, Xing
    Meng, Anxin
    Yu, Li
    CASE STUDIES IN CONSTRUCTION MATERIALS, 2024, 21
  • [26] Echo State Network Based on Improved Knowledge Distillation for Edge Intelligence
    Jian ZHOU
    Yuwen JIANG
    Lijie XU
    Lu ZHAO
    Fu XIAO
    Chinese Journal of Electronics, 2024, 33 (01) : 101 - 111
  • [27] Echo State Network Based on Improved Knowledge Distillation for Edge Intelligence
    Zhou, Jian
    Jiang, Yuwen
    Xu, Lijie
    Zhao, Lu
    Xiao, Fu
    CHINESE JOURNAL OF ELECTRONICS, 2024, 33 (01) : 101 - 111
  • [28] Quintet Margin Loss for an Improved Knowledge Distillation in Histopathology Image Analysis
    Vuong, Trinh T. L.
    Kwak, Jin Tae
    MEDICAL IMAGING 2023, 2023, 12471
  • [29] Leveraging Non-Causal Knowledge via Cross-Network Knowledge Distillation for Real-Time Speech Enhancement
    Park, Hyun Joon
    Shin, Wooseok
    Kim, Jin Sob
    Han, Sung Won
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 1129 - 1133
  • [30] SAR Image Classification with Knowledge Distillation and Class Balancing for Long-Tailed Distributions
    Jahan, Chowdhury Sadman
    Savakis, Andreas
    Blasch, Erik
    2022 IEEE 14TH IMAGE, VIDEO, AND MULTIDIMENSIONAL SIGNAL PROCESSING WORKSHOP (IVMSP), 2022,