Estimating and Maximizing Mutual Information for Knowledge Distillation

被引:2
|
作者
Shrivastava, Aman [1 ]
Qi, Yanjun [1 ]
Ordonez, Vicente [2 ]
机构
[1] Univ Virginia, Charlottesville, VA 22903 USA
[2] Rice Univ, Houston, TX USA
关键词
D O I
10.1109/CVPRW59228.2023.00010
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this work, we propose Mutual Information Maximization Knowledge Distillation (MIMKD). Our method uses a contrastive objective to simultaneously estimate and maximize a lower bound on the mutual information of local and global feature representations between a teacher and a student network. We demonstrate through extensive experiments that this can be used to improve the performance of low capacity models by transferring knowledge from more performant but computationally expensive models. This can be used to produce better models that can be run on devices with low computational resources. Our method is flexible, we can distill knowledge from teachers with arbitrary network architectures to arbitrary student networks. Our empirical results show that MIMKD outperforms competing approaches across a wide range of student-teacher pairs with different capacities, with different architectures, and when student networks are with extremely low capacity. We are able to obtain 74.55% accuracy on CIFAR100 with a ShufflenetV2 from a baseline accuracy of 69.8% by distilling knowledge from ResNet-50. On Imagenet we improve a ResNet-18 network from 68.88% to 70.32% accuracy (1.44%+) using a ResNet-34 teacher network.
引用
收藏
页码:48 / 57
页数:10
相关论文
共 50 条
  • [41] Deep clustering by maximizing mutual information in variational auto-encoder
    Xu, Chaoyang
    Dai, Yuanfei
    Lin, Renjie
    Wang, Shiping
    KNOWLEDGE-BASED SYSTEMS, 2020, 205
  • [42] Mutual Information Maximizing Linear Precoding for Parallel Layer MIMO Detection
    Ohlmer, Eckhard
    Wachsmann, Udo
    Fettweis, Gerhard
    SPAWC 2011: 2011 IEEE 12TH INTERNATIONAL WORKSHOP ON SIGNAL PROCESSING ADVANCES IN WIRELESS COMMUNICATIONS, 2011, : 346 - 350
  • [43] Maximizing Mutual Information Across Feature and Topology Views for Representing Graphs
    Fan, Xiaolong
    Gong, Maoguo
    Wu, Yue
    Li, Hao
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (10) : 10735 - 10747
  • [44] Self-supervised video representation learning by maximizing mutual information
    Xue, Fei
    Ji, Hongbing
    Zhang, Wenbo
    Cao, Yi
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2020, 88
  • [45] Mutual information maximizing GAN inversion for real face with identity preservation
    Lin, Chengde
    Xiong, Shengwu
    Chen, Yaxiong
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2022, 87
  • [46] Graph InfoClust: Maximizing Coarse-Grain Mutual Information in Graphs
    Mavromatis, Costas
    Karypis, George
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2021, PT I, 2021, 12712 : 541 - 553
  • [47] Sensor fusion as optimization: Maximizing mutual information between sensory signals
    Ikeda, T
    Ishiguro, H
    Asada, M
    PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 2, 2004, : 501 - 504
  • [48] Online Power Allocation For Maximizing Mutual Information in Cognitive Radio System
    Vaze, Rahul
    2013 IEEE WIRELESS COMMUNICATIONS AND NETWORKING CONFERENCE (WCNC), 2013, : 3352 - 3357
  • [49] Decoding LDPC Codes with Mutual Information-Maximizing Lookup Tables
    Romero, Francisco Javier Cuadros
    Kurkoski, Brian M.
    2015 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2015, : 426 - 430
  • [50] Estimating Electroencephalograph Network Parameters Using Mutual Information
    Thuraisingham, Ranjit Arulnayagam
    BRAIN CONNECTIVITY, 2018, 8 (05) : 311 - 317