Is My Neural Net Driven by the MDL Principle?

被引:0
|
作者
Brandao, Eduardo [1 ]
Duffner, Stefan [2 ]
Emonet, Remi [1 ]
Habrard, Amaury [1 ,3 ]
Jacquenet, Francois [1 ]
Sebban, Marc [1 ]
机构
[1] Univ Jean Monnet St Etienne, CNRS, Inst Opt Grad Sch, Lab Hubert Curien,UMR 5516, F-42023 St Etienne, France
[2] Univ Lyon, CNRS, INSA Lyon, LIRIS,UMR5205, F-69621 Villeurbanne, France
[3] Inst Univ France IUF, Paris, France
关键词
Neural Networks; MDL; Signal-Noise; Point Jacobians; MINIMUM DESCRIPTION LENGTH;
D O I
10.1007/978-3-031-43415-0_11
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The Minimum Description Length principle (MDL) is a formalization of Occam's razor for model selection, which states that a good model is one that can losslessly compress the data while including the cost of describing the model itself. While MDL can naturally express the behavior of certain models such as autoencoders (that inherently compress data) most representation learning techniques do not rely on such models. Instead, they learn representations by training on general or, for self-supervised learning, pretext tasks. In this paper, we propose a new formulation of the MDL principle that relies on the concept of signal and noise, which are implicitly defined by the learning task at hand. Additionally, we introduce ways to empirically measure the complexity of the learned representations by analyzing the spectra of the point Jacobians. Under certain assumptions, we show that the singular values of the point Jacobians of Neural Networks driven by the MDL principle should follow either a power law or a lognormal distribution. Finally, we conduct experiments to evaluate the behavior of the proposed measure applied to deep neural networks on different datasets, with respect to several types of noise. We observe that the experimental spectral distribution is in agreement with the spectral distribution predicted by our MDL principle, which suggests that neural networks trained with gradient descent on noisy data implicitly abide the MDL principle.
引用
收藏
页码:173 / 189
页数:17
相关论文
共 50 条
  • [1] Is My Neural Net Driven by the MDL Principle?
    Brandao, Eduardo
    Duffner, Stefan
    Emonet, Rémi
    Habrard, Amaury
    Jacquenet, François
    Sebban, Marc
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2023, 14170 LNAI : 173 - 189
  • [2] MDL regularizer: A new regularizer based on the MDL principle
    Saito, K
    Nakano, R
    1997 IEEE INTERNATIONAL CONFERENCE ON NEURAL NETWORKS, VOLS 1-4, 1997, : 1833 - 1838
  • [3] Model Change Detection With the MDL Principle
    Yamanishi, Kenji
    Fukushima, Shintaro
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2018, 64 (09) : 6115 - 6126
  • [4] MDL principle for robust vector quantisation
    Bischof, H
    Leonardis, A
    Selb, A
    PATTERN ANALYSIS AND APPLICATIONS, 1999, 2 (01) : 59 - 72
  • [5] MDL Principle for Robust Vector Quantisation
    Horst Bischof
    Aleš Leonardis
    Alexander Selb
    Pattern Analysis & Applications, 1999, 2 : 59 - 72
  • [6] Explanatory and creative alternatives to the MDL principle
    Hernández-Orallo J.
    García-Varea I.
    Foundations of Science, 2000, 5 (2) : 185 - 207
  • [7] Hypothesis selection and testing by the MDL principle
    Rissanen, J
    COMPUTER JOURNAL, 1999, 42 (04): : 260 - 269
  • [8] DETECTION OF THE NUMBER OF COHERENT SIGNALS BY THE MDL PRINCIPLE
    WAX, M
    ZISKIND, I
    IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1989, 37 (08): : 1190 - 1196
  • [9] An optimal DNA segmentation based on the MDL principle
    Szpankowski, W
    Ren, WH
    Szpankowski, L
    PROCEEDINGS OF THE 2003 IEEE BIOINFORMATICS CONFERENCE, 2003, : 541 - 546
  • [10] FMRI baseline drift estimation method by MDL principle
    Bazargani, Negar
    Nosratinia, Aria
    Gopinath, Kaundinya
    Briggs, Richard W.
    2007 4TH IEEE INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING : MACRO TO NANO, VOLS 1-3, 2007, : 472 - +