Bayesian Model Selection, the Marginal Likelihood, and Generalization

被引：0

作者：

Lotfi, Sanae ^{[1
]}

Izmailov, Pavel ^{[1
]}

Benton, Gregory ^{[1
]}

Goldblum, Micah ^{[1
]}

Wilson, Andrew Gordon ^{[1
]}

机构：

[1] NYU, New York, NY 10003 USA

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162 | 2022年

关键词：

CHOICE;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

How do we compare between hypotheses that are entirely consistent with observations? The marginal likelihood (aka Bayesian evidence), which represents the probability of generating our observations from a prior, provides a distinctive approach to this foundational question, automatically encoding Occam's razor. Although it has been observed that the marginal likelihood can overfit and is sensitive to prior assumptions, its limitations for hyperparameter learning and discrete model comparison have not been thoroughly investigated. We first revisit the appealing properties of the marginal likelihood for learning constraints and hypothesis testing. We then highlight the conceptual and practical issues in using the marginal likelihood as a proxy for generalization. Namely, we show how marginal likelihood can be negatively correlated with generalization, with implications for neural architecture search, and can lead to both underfitting and overfitting in hyperparameter learning. We provide a partial remedy through a conditional marginal likelihood, which we show is more aligned with generalization, and practically valuable for large-scale hyperparameter learning, such as in deep kernel learning.

引用

页数：25

共 50 条

[31] A LIKELIHOOD-BASED MODEL FOR VALIDITY GENERALIZATION
THOMAS, H
JOURNAL OF APPLIED PSYCHOLOGY, 1990, 75 (01) : 13 - 20
[32] Penalized likelihood and Bayesian function selection in regression models
Scheipl, Fabian
Kneib, Thomas
Fahrmeir, Ludwig
ASTA-ADVANCES IN STATISTICAL ANALYSIS, 2013, 97 (04) : 349 - 385
[33] Penalized likelihood and Bayesian function selection in regression models
Fabian Scheipl
Thomas Kneib
Ludwig Fahrmeir
AStA Advances in Statistical Analysis, 2013, 97 : 349 - 385
[34] Empirical-likelihood-based criteria for model selection on marginal analysis of longitudinal data with dropout missingness
Chen, Chixiang
Shen, Biyi
Zhang, Lijun
Xue, Yuan
Wang, Ming
BIOMETRICS, 2019, 75 (03) : 950 - 965
[35] Group Sparse Bayesian Learning Via Exact and Fast Marginal Likelihood Maximization
Ma, Zeqiang
Dai, Wei
Liu, Yimin
Wang, Xiqin
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2017, 65 (10) : 2741 - 2753
[36] Bayesian updating and marginal likelihood estimation by cross entropy based importance sampling
Engel, Michael
Kanjilal, Oindrila
Papaioannou, Iason
Straub, Daniel
JOURNAL OF COMPUTATIONAL PHYSICS, 2023, 473
[37] Consensus Based Distributed Sparse Bayesian Learning by Fast Marginal Likelihood Maximization
Manss, Christoph
Shutin, Dmitriy
Leus, Geert
IEEE SIGNAL PROCESSING LETTERS, 2020, 27 : 2119 - 2123
[38] GROUP SPARSE BAYESIAN LEARNING VIA EXACT AND FAST MARGINAL LIKELIHOOD MAXIMIZATION
Ma, Zeqiang
Dai, Wei
Liu, Yimin
Wang, Xiqin
2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 4508 - 4512
[39] A marginal likelihood model for family-based data
Lo, SH
Liu, X
Shao, YZ
ANNALS OF HUMAN GENETICS, 2003, 67 : 357 - 366
[40] Pairwise marginal likelihood for the bradley-terry model
Feddag M.-L.
Journal of Statistical Theory and Practice, 2013, 7 (1) : 49 - 58

← 1 2 3 4 5 →