Estimating and Maximizing Mutual Information for Knowledge Distillation

被引:2
|
作者
Shrivastava, Aman [1 ]
Qi, Yanjun [1 ]
Ordonez, Vicente [2 ]
机构
[1] Univ Virginia, Charlottesville, VA 22903 USA
[2] Rice Univ, Houston, TX USA
关键词
D O I
10.1109/CVPRW59228.2023.00010
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this work, we propose Mutual Information Maximization Knowledge Distillation (MIMKD). Our method uses a contrastive objective to simultaneously estimate and maximize a lower bound on the mutual information of local and global feature representations between a teacher and a student network. We demonstrate through extensive experiments that this can be used to improve the performance of low capacity models by transferring knowledge from more performant but computationally expensive models. This can be used to produce better models that can be run on devices with low computational resources. Our method is flexible, we can distill knowledge from teachers with arbitrary network architectures to arbitrary student networks. Our empirical results show that MIMKD outperforms competing approaches across a wide range of student-teacher pairs with different capacities, with different architectures, and when student networks are with extremely low capacity. We are able to obtain 74.55% accuracy on CIFAR100 with a ShufflenetV2 from a baseline accuracy of 69.8% by distilling knowledge from ResNet-50. On Imagenet we improve a ResNet-18 network from 68.88% to 70.32% accuracy (1.44%+) using a ResNet-34 teacher network.
引用
收藏
页码:48 / 57
页数:10
相关论文
共 50 条
  • [1] Knowledge Distillation for Object Detection Based on Mutual Information
    Liu, Xi
    Zhu, Ziqi
    2021 4TH INTERNATIONAL CONFERENCE ON INTELLIGENT AUTONOMOUS SYSTEMS (ICOIAS 2021), 2021, : 18 - 23
  • [2] On Binary Quantizer For Maximizing Mutual Information
    Nguyen, Thuan Duc
    Nguyen, Thinh
    IEEE TRANSACTIONS ON COMMUNICATIONS, 2020, 68 (09) : 5435 - 5445
  • [3] SMKD: Selective Mutual Knowledge Distillation
    Li, Ziyun
    Wang, Xinshao
    Robertson, Neil M.
    Clifton, David A.
    Meinel, Christoph
    Yang, Haojin
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [4] GLACIER SURFACE MONITORING BY MAXIMIZING MUTUAL INFORMATION
    Erten, Esra
    Rossi, Cristian
    Hajnsek, Irena
    XXII ISPRS CONGRESS, TECHNICAL COMMISSION VII, 2012, 39 (B7): : 41 - 44
  • [5] Feature Selection by Maximizing Part Mutual Information
    Gao, Wanfu
    Hu, Liang
    Zhang, Ping
    2018 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND MACHINE LEARNING (SPML 2018), 2018, : 120 - 127
  • [6] Estimating α-Rank by Maximizing Information Gain
    Rashid, Tabish
    Zhang, Cheng
    Ciosek, Kamil
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 5673 - 5681
  • [7] Clustering by Maximizing Mutual Information Across Views
    Kien Do
    Truyen Tran
    Venkatesh, Svetha
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 9908 - 9918
  • [8] Feature Fusion for Online Mutual Knowledge Distillation
    Kim, Jangho
    Hyun, Minsung
    Chung, Inseop
    Kwak, Nojun
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 4619 - 4625
  • [9] Maximizing discrimination capability of knowledge distillation with energy function
    Kim, Seonghak
    Ham, Gyeongdo
    Lee, Suin
    Jang, Donggon
    Kim, Daeshik
    KNOWLEDGE-BASED SYSTEMS, 2024, 296
  • [10] Estimating Mutual Information on Data Streams
    Keller, Fabian
    Mueller, Emmanuel
    Boehm, Klemens
    PROCEEDINGS OF THE 27TH INTERNATIONAL CONFERENCE ON SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, 2015,