On Neural Networks Fitting, Compression, and Generalization Behavior via Information-Bottleneck-like Approaches

被引:0
|
作者
Lyu, Zhaoyan [1 ]
Aminian, Gholamali [2 ]
Rodrigues, Miguel R. D. [1 ]
机构
[1] UCL, Dept Elect & Elect Engn, Gower St, London WC1E 6BT, England
[2] British Lib, Alan Turing Inst, 96 Euston Rd, London NW1 2DB, England
关键词
deep learning; information theory; information bottleneck; generalization; fitting; compression;
D O I
10.3390/e25071063
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
It is well-known that a neural network learning process-along with its connections to fitting, compression, and generalization-is not yet well understood. In this paper, we propose a novel approach to capturing such neural network dynamics using information-bottleneck-type techniques, involving the replacement of mutual information measures (which are notoriously difficult to estimate in high-dimensional spaces) by other more tractable ones, including (1) the minimum mean-squared error associated with the reconstruction of the network input data from some intermediate network representation and (2) the cross-entropy associated with a certain class label given some network representation. We then conducted an empirical study in order to ascertain how different network models, network learning algorithms, and datasets may affect the learning dynamics. Our experiments show that our proposed approach appears to be more reliable in comparison with classical information bottleneck ones in capturing network dynamics during both the training and testing phases. Our experiments also reveal that the fitting and compression phases exist regardless of the choice of activation function. Additionally, our findings suggest that model architectures, training algorithms, and datasets that lead to better generalization tend to exhibit more pronounced fitting and compression phases.
引用
收藏
页数:28
相关论文
共 50 条
  • [1] Splitting of Composite Neural Networks via Proximal Operator With Information Bottleneck
    Han, Sang-Il
    Nakamura, Kensuke
    Hong, Byung-Woo
    [J]. IEEE ACCESS, 2024, 12 (157-167) : 157 - 167
  • [2] Sentiment Analysis via Deep Multichannel Neural Networks With Variational Information Bottleneck
    Gu, Tong
    Xu, Guoliang
    Luo, Jiangtao
    [J]. IEEE ACCESS, 2020, 8 : 121014 - 121021
  • [3] TIME SERIES PREDICTION VIA RECURRENT NEURAL NETWORKS WITH THE INFORMATION BOTTLENECK PRINCIPLE
    Xu, Duo
    Fekri, Faramarz
    [J]. 2018 IEEE 19TH INTERNATIONAL WORKSHOP ON SIGNAL PROCESSING ADVANCES IN WIRELESS COMMUNICATIONS (SPAWC), 2018, : 276 - 280
  • [4] Information Bottleneck Theory on Convolutional Neural Networks
    Li, Junjie
    Liu, Ding
    [J]. NEURAL PROCESSING LETTERS, 2021, 53 (02) : 1385 - 1400
  • [5] Information Bottleneck Theory on Convolutional Neural Networks
    Junjie Li
    Ding Liu
    [J]. Neural Processing Letters, 2021, 53 : 1385 - 1400
  • [6] Federated learning via reweighting information bottleneck with domain generalization
    Li, Fangyu
    Chen, Xuqiang
    Han, Zhu
    Du, Yongping
    Han, Honggui
    [J]. INFORMATION SCIENCES, 2024, 677
  • [7] Compressing Neural Networks using the Variational Information Bottleneck
    Dai, Bin
    Zhu, Chen
    Guo, Baining
    Wipf, David
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [8] Markov Information Bottleneck to Improve Information Flow in Stochastic Neural Networks
    Thanh Tang Nguyen
    Choi, Jaesik
    [J]. ENTROPY, 2019, 21 (10)
  • [9] Information Bottleneck in Control Tasks with Recurrent Spiking Neural Networks
    Vasu, Madhavun Candadai
    Izquierdo, Eduardo J.
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2017, PT I, 2017, 10613 : 236 - 244
  • [10] Moving image compression and generalization capability of constructive neural networks
    Ma, L
    Khorasani, K
    [J]. APPLICATIONS AND SCIENCE OF COMPUTATIONAL INTELLIGENCE IV, 2001, 4390 : 187 - 198