Leveraging angular distributions for improved knowledge distillation

被引:5
|
作者
Jeon, Eun Som [1 ]
Choi, Hongjun [1 ]
Shukla, Ankita [1 ]
Turaga, Pavan [1 ]
机构
[1] Arizona State Univ, Sch Arts, Media & Engn & Sch Elect Comp & Energy Engn, Geometr Media Lab, Tempe, AZ 85281 USA
关键词
Knowledge distillation; Angular distribution; Angular margin; Image classification;
D O I
10.1016/j.neucom.2022.11.029
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Knowledge distillation as a broad class of methods has led to the development of lightweight and mem-ory efficient models, using a pre-trained model with a large capacity (teacher network) to train a smaller model (student network). Recently, additional variations for knowledge distillation, utilizing activation maps of intermediate layers as the source of knowledge, have been studied. Generally, in computer vision applications, it is seen that the feature activation learned by a higher-capacity model contains richer knowledge, highlighting complete objects while focusing less on the background. Based on this observa-tion, we leverage the teacher's dual ability to accurately distinguish between positive (relevant to the tar-get object) and negative (irrelevant) areas.We propose a new loss function for distillation, called angular margin-based distillation (AMD) loss. AMD loss uses the angular distance between positive and negative features by projecting them onto a hypersphere, motivated by the near angular distributions seen in many feature extractors. Then, we cre-ate a more attentive feature that is angularly distributed on the hypersphere by introducing an angular margin to the positive feature. Transferring such knowledge from the teacher network enables the stu-dent model to harness the teacher's higher discrimination of positive and negative features, thus distilling superior student models. The proposed method is evaluated for various student-teacher network pairs on four public datasets. Furthermore, we show that the proposed method has advantages in compatibility with other learning techniques, such as using fine-grained features, augmentation, and other distillation methods.(c) 2022 Elsevier B.V. All rights reserved.
引用
收藏
页码:466 / 481
页数:16
相关论文
共 50 条
  • [31] Knowledge Distillation for 6D Pose Estimation by Aligning Distributions of Local Predictions
    Guo, Shuxuan
    Hu, Yinlin
    Alvarez, Jose M.
    Salzmann, Mathieu
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 18633 - 18642
  • [32] SME Knowledge Transfer through Social Networking: Leveraging Storytelling for Improved Communication
    Martin-Niemi, Fa
    Greatbanks, Richard
    COMPUTER-MEDIATED SOCIAL NETWORKING, 2009, 5322 : 86 - 92
  • [33] LEVERAGING LOGGED INTERMEDIATE DESIGN ATTRIBUTES FOR IMPROVED KNOWLEDGE DISCOVERY IN ENGINEERING DESIGN
    Bang, Hyunseung
    Selva, Daniel
    PROCEEDINGS OF THE ASME INTERNATIONAL DESIGN ENGINEERING TECHNICAL CONFERENCES AND COMPUTERS AND INFORMATION IN ENGINEERING CONFERENCE, 2017, VOL 2A, 2017,
  • [34] Leveraging prior knowledge for improved retention prediction in reversed-phase HPLC
    Wiczling, Pawel
    JOURNAL OF CHROMATOGRAPHY A, 2025, 1746
  • [35] PHOTONEUTRON ANGULAR DISTRIBUTIONS
    GELLER, K
    HALPERN, J
    YERGIN, PF
    PHYSICAL REVIEW, 1954, 95 (02): : 659 - 659
  • [36] ANGULAR DISTRIBUTIONS OF PHOTODEUTERONS
    EDWARDS, DA
    WOLFE, B
    SILVERMAN, A
    DEWIRE, JW
    PHYSICAL REVIEW, 1954, 95 (02): : 629 - 630
  • [37] ANGULAR DISTRIBUTIONS OF PHOTOPROTONS
    ROTHMAN, MA
    MANN, AK
    HALPERN, J
    PHYSICAL REVIEW, 1952, 86 (04): : 629 - 629
  • [38] Photoelectron angular distributions
    Reid, KL
    ANNUAL REVIEW OF PHYSICAL CHEMISTRY, 2003, 54 : 397 - 424
  • [39] ANGULAR DISTRIBUTIONS OF PHOTOPROTONS
    MANN, AK
    HALPERN, J
    ROTHMAN, M
    PHYSICAL REVIEW, 1952, 87 (01): : 146 - 149
  • [40] Angular distributions at the Tevatron
    Harel, A.
    5TH INTERNATIONAL WORKSHOP ON TOP QUARK PHYSICS (TOP2012), 2013, 452