Estimating and Maximizing Mutual Information for Knowledge Distillation

被引:2
|
作者
Shrivastava, Aman [1 ]
Qi, Yanjun [1 ]
Ordonez, Vicente [2 ]
机构
[1] Univ Virginia, Charlottesville, VA 22903 USA
[2] Rice Univ, Houston, TX USA
关键词
D O I
10.1109/CVPRW59228.2023.00010
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this work, we propose Mutual Information Maximization Knowledge Distillation (MIMKD). Our method uses a contrastive objective to simultaneously estimate and maximize a lower bound on the mutual information of local and global feature representations between a teacher and a student network. We demonstrate through extensive experiments that this can be used to improve the performance of low capacity models by transferring knowledge from more performant but computationally expensive models. This can be used to produce better models that can be run on devices with low computational resources. Our method is flexible, we can distill knowledge from teachers with arbitrary network architectures to arbitrary student networks. Our empirical results show that MIMKD outperforms competing approaches across a wide range of student-teacher pairs with different capacities, with different architectures, and when student networks are with extremely low capacity. We are able to obtain 74.55% accuracy on CIFAR100 with a ShufflenetV2 from a baseline accuracy of 69.8% by distilling knowledge from ResNet-50. On Imagenet we improve a ResNet-18 network from 68.88% to 70.32% accuracy (1.44%+) using a ResNet-34 teacher network.
引用
收藏
页码:48 / 57
页数:10
相关论文
共 50 条
  • [21] Federated Split Learning via Mutual Knowledge Distillation
    Luo, Linjun
    Zhang, Xinglin
    IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, 2024, 11 (03): : 2729 - 2741
  • [22] Distillation protocols: Output entanglement and local mutual information
    Horodecki, M
    Oppenheim, J
    Sen, A
    Sen, U
    PHYSICAL REVIEW LETTERS, 2004, 93 (17) : 170503 - 1
  • [23] Variational Information Distillation for Knowledge Transfer
    Ahn, Sungsoo
    Hu, Shell Xu
    Damianou, Andreas
    Lawrence, Neil D.
    Dai, Zhenwen
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 9155 - 9163
  • [24] Knowledge Distillation with Information Compressed Representations
    Zhang, Yao
    Zhang, Xuejie
    Wang, Jin
    Zhou, Xiaobing
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT III, ICIC 2024, 2024, 14877 : 402 - 413
  • [25] Knowledge Distillation via Information Matching
    Zhu, Honglin
    Jiang, Ning
    Tang, Jialiang
    Huang, Xinlei
    NEURAL INFORMATION PROCESSING, ICONIP 2023, PT IV, 2024, 14450 : 405 - 417
  • [26] Automatic Threshold Selection Guided by Maximizing Normalized Mutual Information
    Zou Y.-B.
    Lei B.-J.
    Zang Z.-X.
    Wang J.-Y.
    Hu Z.-H.
    Dong F.-M.
    Zidonghua Xuebao/Acta Automatica Sinica, 2019, 45 (07): : 1373 - 1385
  • [27] Estimating Mutual Information via Geodesic kNN
    Marx, Alexander
    Fischer, Jonas
    PROCEEDINGS OF THE 2022 SIAM INTERNATIONAL CONFERENCE ON DATA MINING, SDM, 2022, : 415 - 423
  • [28] Automatic Extrinsic Calibration of Vision and Lidar by Maximizing Mutual Information
    Pandey, Gaurav
    McBride, James R.
    Savarese, Silvio
    Eustice, Ryan M.
    JOURNAL OF FIELD ROBOTICS, 2015, 32 (05) : 696 - 722
  • [29] Estimating Total Correlation with Mutual Information Estimators
    Bai, Ke
    Cheng, Pengyu
    Hao, Weituo
    Henao, Ricardo
    Carin, Lawrence
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 206, 2023, 206
  • [30] Optimal Order Reduction of Probability Distributions by Maximizing Mutual Information
    Vidyasagar, M.
    2011 50TH IEEE CONFERENCE ON DECISION AND CONTROL AND EUROPEAN CONTROL CONFERENCE (CDC-ECC), 2011, : 716 - 721