Bayesian subset selection and variable importance for interpretable prediction and classification

被引:0
|
作者
Kowal, Daniel R. [1 ]
机构
[1] Rice Univ, Dept Stat, Houston, TX 77005 USA
基金
美国国家卫生研究院;
关键词
education; linear regression; logistic regression; model selection; penalized regression; REGRESSION; MODELS; CHOICE; LASSO;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Subset selection is a valuable tool for interpretable learning, scientific discovery, and data compression. However, classical subset selection is often avoided due to selection instabil-ity, lack of regularization, and difficulties with post-selection inference. We address these challenges from a Bayesian perspective. Given any Bayesian predictive model M, we ex-tract a family of near-optimal subsets of variables for linear prediction or classification. This strategy deemphasizes the role of a single "best" subset and instead advances the broader perspective that often many subsets are highly competitive. The acceptable family of subsets offers a new pathway for model interpretation and is neatly summarized by key members such as the smallest acceptable subset, along with new (co-) variable importance metrics based on whether variables (co-) appear in all, some, or no acceptable subsets. More broadly, we apply Bayesian decision analysis to derive the optimal linear coefficients for any subset of variables. These coefficients inherit both regularization and predictive uncertainty quantification via M. For both simulated and real data, the proposed approach exhibits better prediction, interval estimation, and variable selection than competing Bayesian and frequentist selection methods. These tools are applied to a large education dataset with highly correlated covariates. Our analysis provides unique insights into the combination of environmental, socioeconomic, and demographic factors that predict educational outcomes, and identifies over 200 distinct subsets of variables that offer near-optimal out-of-sample predictive accuracy.
引用
收藏
页数:38
相关论文
共 50 条
  • [1] Bayesian subset selection and variable importance for interpretable prediction and classification
    Kowal, Daniel R.
    [J]. Journal of Machine Learning Research, 2022, 23
  • [2] Multivariate Bayesian variable selection and prediction
    Brown, PJ
    Vannucci, M
    Fearn, T
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1998, 60 : 627 - 641
  • [3] Multivariate Bayesian variable selection and prediction
    Brown, PJ
    Vannucci, M
    Fearn, T
    [J]. MINING AND MODELING MASSIVE DATA SETS IN SCIENCE, ENGINEERING, AND BUSINESS WITH A SUBTHEME IN ENVIRONMENTAL STATISTICS, 1997, 29 (01): : 271 - 271
  • [4] Scalable importance tempering and Bayesian variable selection
    Zanella, Giacomo
    Roberts, Gareth
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2019, 81 (03) : 489 - 517
  • [5] An accurate and interpretable Bayesian classification model for prediction of HERG liability
    Sun, HM
    [J]. ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2005, 230 : U1408 - U1409
  • [6] An accurate and interpretable Bayesian classification model for prediction of hERG liability
    Sun, Hongmao
    [J]. CHEMMEDCHEM, 2006, 1 (03) : 315 - 322
  • [7] Partially Bayesian variable selection in classification trees
    Noe, Douglas A.
    He, Xuming
    [J]. STATISTICS AND ITS INTERFACE, 2008, 1 (01) : 155 - 167
  • [8] Bayesian Subset Selection for Two-Threshold Variable Autoregressive Models
    Ni, Shuxia
    Xia, Qiang
    Liu, Jinshan
    [J]. STUDIES IN NONLINEAR DYNAMICS AND ECONOMETRICS, 2018, 22 (04):
  • [9] Bayesian Rule Sets for Interpretable Classification
    Wang, Tong
    Rudin, Cynthia
    Velez-Doshi, Finale
    Liu, Yimin
    Klampfl, Erica
    MacNeille, Perry
    [J]. 2016 IEEE 16TH INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2016, : 1269 - 1274
  • [10] PROTOTYPE SELECTION FOR INTERPRETABLE CLASSIFICATION
    Bien, Jacob
    Tibshirani, Robert
    [J]. ANNALS OF APPLIED STATISTICS, 2011, 5 (04): : 2403 - 2424