Leveraging angular distributions for improved knowledge distillation

被引:5
|
作者
Jeon, Eun Som [1 ]
Choi, Hongjun [1 ]
Shukla, Ankita [1 ]
Turaga, Pavan [1 ]
机构
[1] Arizona State Univ, Sch Arts, Media & Engn & Sch Elect Comp & Energy Engn, Geometr Media Lab, Tempe, AZ 85281 USA
关键词
Knowledge distillation; Angular distribution; Angular margin; Image classification;
D O I
10.1016/j.neucom.2022.11.029
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Knowledge distillation as a broad class of methods has led to the development of lightweight and mem-ory efficient models, using a pre-trained model with a large capacity (teacher network) to train a smaller model (student network). Recently, additional variations for knowledge distillation, utilizing activation maps of intermediate layers as the source of knowledge, have been studied. Generally, in computer vision applications, it is seen that the feature activation learned by a higher-capacity model contains richer knowledge, highlighting complete objects while focusing less on the background. Based on this observa-tion, we leverage the teacher's dual ability to accurately distinguish between positive (relevant to the tar-get object) and negative (irrelevant) areas.We propose a new loss function for distillation, called angular margin-based distillation (AMD) loss. AMD loss uses the angular distance between positive and negative features by projecting them onto a hypersphere, motivated by the near angular distributions seen in many feature extractors. Then, we cre-ate a more attentive feature that is angularly distributed on the hypersphere by introducing an angular margin to the positive feature. Transferring such knowledge from the teacher network enables the stu-dent model to harness the teacher's higher discrimination of positive and negative features, thus distilling superior student models. The proposed method is evaluated for various student-teacher network pairs on four public datasets. Furthermore, we show that the proposed method has advantages in compatibility with other learning techniques, such as using fine-grained features, augmentation, and other distillation methods.(c) 2022 Elsevier B.V. All rights reserved.
引用
收藏
页码:466 / 481
页数:16
相关论文
共 50 条
  • [1] Leveraging different learning styles for improved knowledge distillation in biomedical imaging
    Niyaz, Usma
    Sambyal, Abhishek Singh
    Bathula, Deepti R.
    COMPUTERS IN BIOLOGY AND MEDICINE, 2024, 168
  • [2] Leveraging different learning styles for improved knowledge distillation in biomedical imaging
    Niyaz, Usma
    Sambyal, Abhishek Singh
    Bathula, Deepti R.
    Computers in Biology and Medicine, 2024, 168
  • [3] Leveraging logit uncertainty for better knowledge distillation
    Guo, Zhen
    Wang, Dong
    He, Qiang
    Zhang, Pengzhou
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [4] MIXED BANDWIDTH ACOUSTIC MODELING LEVERAGING KNOWLEDGE DISTILLATION
    Fukuda, Takashi
    Thomas, Samuel
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 509 - 515
  • [5] Improved Knowledge Distillation via Teacher Assistant
    Mirzadeh, Seyed Iman
    Farajtabar, Mehrdad
    Li, Ang
    Levine, Nir
    Matsukawa, Akihiro
    Ghasemzadeh, Hassan
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 5191 - 5198
  • [6] Leveraging Contrastive Learning and Knowledge Distillation for Incomplete Modality Rumor Detection
    Xul, Fan
    Fan, Pinyun
    Huang, Qi
    Zou, Bowei
    Awe, AiTi
    Wang, Mingwen
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 13492 - 13503
  • [7] Leveraging Speech Production Knowledge for Improved Speech Recognition
    Sangwan, Abhijeet
    Hansen, John H. L.
    2009 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION & UNDERSTANDING (ASRU 2009), 2009, : 58 - 63
  • [8] Oracle Teacher: Leveraging Target Information for Better Knowledge Distillation of CTC Models
    Yoon, Ji Won
    Kim, Hyung Yong
    Lee, Hyeonseung
    Ahn, Sunghwan
    Kim, Nam Soo
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 2974 - 2987
  • [9] Multimodal fusion and knowledge distillation for improved anomaly detection
    Lu, Meichen
    Chai, Yi
    Xu, Kaixiong
    Chen, Weiqing
    Ao, Fei
    Ji, Wen
    VISUAL COMPUTER, 2024,
  • [10] Improved Knowledge Distillation for Crowd Counting on IoT Devices
    Huang, Zuo
    Sinnott, Richard O.
    2023 IEEE INTERNATIONAL CONFERENCE ON EDGE COMPUTING AND COMMUNICATIONS, EDGE, 2023, : 207 - 214