Bayesian Model Selection, the Marginal Likelihood, and Generalization

被引:0
|
作者
Lotfi, Sanae [1 ]
Izmailov, Pavel [1 ]
Benton, Gregory [1 ]
Goldblum, Micah [1 ]
Wilson, Andrew Gordon [1 ]
机构
[1] NYU, New York, NY 10003 USA
关键词
CHOICE;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
How do we compare between hypotheses that are entirely consistent with observations? The marginal likelihood (aka Bayesian evidence), which represents the probability of generating our observations from a prior, provides a distinctive approach to this foundational question, automatically encoding Occam's razor. Although it has been observed that the marginal likelihood can overfit and is sensitive to prior assumptions, its limitations for hyperparameter learning and discrete model comparison have not been thoroughly investigated. We first revisit the appealing properties of the marginal likelihood for learning constraints and hypothesis testing. We then highlight the conceptual and practical issues in using the marginal likelihood as a proxy for generalization. Namely, we show how marginal likelihood can be negatively correlated with generalization, with implications for neural architecture search, and can lead to both underfitting and overfitting in hyperparameter learning. We provide a partial remedy through a conditional marginal likelihood, which we show is more aligned with generalization, and practically valuable for large-scale hyperparameter learning, such as in deep kernel learning.
引用
收藏
页数:25
相关论文
共 50 条
  • [31] A LIKELIHOOD-BASED MODEL FOR VALIDITY GENERALIZATION
    THOMAS, H
    JOURNAL OF APPLIED PSYCHOLOGY, 1990, 75 (01) : 13 - 20
  • [32] Penalized likelihood and Bayesian function selection in regression models
    Scheipl, Fabian
    Kneib, Thomas
    Fahrmeir, Ludwig
    ASTA-ADVANCES IN STATISTICAL ANALYSIS, 2013, 97 (04) : 349 - 385
  • [33] Penalized likelihood and Bayesian function selection in regression models
    Fabian Scheipl
    Thomas Kneib
    Ludwig Fahrmeir
    AStA Advances in Statistical Analysis, 2013, 97 : 349 - 385
  • [34] Empirical-likelihood-based criteria for model selection on marginal analysis of longitudinal data with dropout missingness
    Chen, Chixiang
    Shen, Biyi
    Zhang, Lijun
    Xue, Yuan
    Wang, Ming
    BIOMETRICS, 2019, 75 (03) : 950 - 965
  • [35] Group Sparse Bayesian Learning Via Exact and Fast Marginal Likelihood Maximization
    Ma, Zeqiang
    Dai, Wei
    Liu, Yimin
    Wang, Xiqin
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2017, 65 (10) : 2741 - 2753
  • [36] Bayesian updating and marginal likelihood estimation by cross entropy based importance sampling
    Engel, Michael
    Kanjilal, Oindrila
    Papaioannou, Iason
    Straub, Daniel
    JOURNAL OF COMPUTATIONAL PHYSICS, 2023, 473
  • [37] Consensus Based Distributed Sparse Bayesian Learning by Fast Marginal Likelihood Maximization
    Manss, Christoph
    Shutin, Dmitriy
    Leus, Geert
    IEEE SIGNAL PROCESSING LETTERS, 2020, 27 : 2119 - 2123
  • [38] GROUP SPARSE BAYESIAN LEARNING VIA EXACT AND FAST MARGINAL LIKELIHOOD MAXIMIZATION
    Ma, Zeqiang
    Dai, Wei
    Liu, Yimin
    Wang, Xiqin
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 4508 - 4512
  • [39] A marginal likelihood model for family-based data
    Lo, SH
    Liu, X
    Shao, YZ
    ANNALS OF HUMAN GENETICS, 2003, 67 : 357 - 366
  • [40] Pairwise marginal likelihood for the bradley-terry model
    Feddag M.-L.
    Journal of Statistical Theory and Practice, 2013, 7 (1) : 49 - 58