Leveraging angular distributions for improved knowledge distillation

被引:5
|
作者
Jeon, Eun Som [1 ]
Choi, Hongjun [1 ]
Shukla, Ankita [1 ]
Turaga, Pavan [1 ]
机构
[1] Arizona State Univ, Sch Arts, Media & Engn & Sch Elect Comp & Energy Engn, Geometr Media Lab, Tempe, AZ 85281 USA
关键词
Knowledge distillation; Angular distribution; Angular margin; Image classification;
D O I
10.1016/j.neucom.2022.11.029
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Knowledge distillation as a broad class of methods has led to the development of lightweight and mem-ory efficient models, using a pre-trained model with a large capacity (teacher network) to train a smaller model (student network). Recently, additional variations for knowledge distillation, utilizing activation maps of intermediate layers as the source of knowledge, have been studied. Generally, in computer vision applications, it is seen that the feature activation learned by a higher-capacity model contains richer knowledge, highlighting complete objects while focusing less on the background. Based on this observa-tion, we leverage the teacher's dual ability to accurately distinguish between positive (relevant to the tar-get object) and negative (irrelevant) areas.We propose a new loss function for distillation, called angular margin-based distillation (AMD) loss. AMD loss uses the angular distance between positive and negative features by projecting them onto a hypersphere, motivated by the near angular distributions seen in many feature extractors. Then, we cre-ate a more attentive feature that is angularly distributed on the hypersphere by introducing an angular margin to the positive feature. Transferring such knowledge from the teacher network enables the stu-dent model to harness the teacher's higher discrimination of positive and negative features, thus distilling superior student models. The proposed method is evaluated for various student-teacher network pairs on four public datasets. Furthermore, we show that the proposed method has advantages in compatibility with other learning techniques, such as using fine-grained features, augmentation, and other distillation methods.(c) 2022 Elsevier B.V. All rights reserved.
引用
收藏
页码:466 / 481
页数:16
相关论文
共 50 条
  • [41] Angular distributions of photoelectrons
    Morgenstern, R.
    Niehaus, A.
    Ruf, M. W.
    CHEMICAL PHYSICS LETTERS, 1970, 4 (10) : 635 - 638
  • [42] Angular distributions in multifragmentation
    Stoenner, RW
    Klobuchar, RL
    Haustein, PE
    Virtes, GJ
    Cumming, JB
    Loveland, W
    PHYSICAL REVIEW C, 2006, 73 (04):
  • [43] Guiding CTC Posterior Spike Timings for Improved Posterior Fusion and Knowledge Distillation
    Kurata, Gakuto
    Audhkhasi, Kartik
    INTERSPEECH 2019, 2019, : 1616 - 1620
  • [44] DISCOVER THE EFFECTIVE STRATEGY FOR FACE RECOGNITION MODEL COMPRESSION BY IMPROVED KNOWLEDGE DISTILLATION
    Wang, Mengjiao
    Liu, Rujie
    Abe, Narishige
    Uchida, Hidetsugu
    Matsunami, Tomoaki
    Yamada, Shigefumi
    2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2018, : 2416 - 2420
  • [45] Improved Knowledge Distillation for Training Fast Low Resolution Face Recognition Model
    Wang, Mengjiao
    Liu, Rujie
    Hajime, Nada
    Abe, Narishige
    Uchida, Hidetsugu
    Matsunami, Tomoaki
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 2655 - 2661
  • [46] Knowledge Augmentation for Distillation: A General and Effective Approach to Enhance Knowledge Distillation
    Tang, Yinan
    Guo, Zhenhua
    Wang, Li
    Fan, Baoyu
    Cao, Fang
    Gao, Kai
    Zhang, Hongwei
    Li, Rengang
    PROCEEDINGS OF THE 1ST INTERNATIONAL WORKSHOP ON EFFICIENT MULTIMEDIA COMPUTING UNDER LIMITED RESOURCES, EMCLR 2024, 2024, : 23 - 31
  • [47] Leveraging Corporate Knowledge
    Cronau, D. A.
    LIBRARY REVIEW, 2006, 55 (03) : 230 - +
  • [48] Deep dive into clarity: Leveraging signal-to-noise ratio awareness and knowledge distillation for underwater image enhancement
    Fan, Guodong
    Zhou, Jingchun
    Xu, Chengpei
    Cheng, Zheng
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 269
  • [49] Explaining Knowledge Distillation by Quantifying the Knowledge
    Cheng, Xu
    Rao, Zhefan
    Chen, Yilan
    Zhang, Quanshi
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, : 12922 - 12932
  • [50] Weighted Knowledge Based Knowledge Distillation
    Kang S.
    Seo K.
    Transactions of the Korean Institute of Electrical Engineers, 2022, 71 (02): : 431 - 435