Model selection for linear classifiers using Bayesian error estimation

被引:13
|
作者
Huttunen, Heikki [1 ]
Tohka, Jussi [2 ,3 ]
机构
[1] Tampere Univ Technol, Dept Signal Proc, FIN-33101 Tampere, Finland
[2] Univ Carlos III Madrid, Dept Bioengn & Aerosp Engn, E-28903 Getafe, Spain
[3] Inst Invest Sanitaria Gregorio Maranon, Madrid, Spain
关键词
Logistic regression; Support vector machine; Regularization; Bayesian error estimator; Linear classifier; CLASSIFICATION; PERFORMANCE;
D O I
10.1016/j.patcog.2015.05.005
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Regularized linear models are important classification methods for high dimensional problems, where regularized linear classifiers are often preferred due to their ability to avoid overfitting. The degree of freedom of the model dis determined by a regularization parameter, which is typically selected using counting based approaches, such as K-fold cross-validation. For large data, this can be very time consuming, and, for small sample sizes, the accuracy of the model selection is limited by the large variance of CV error estimates. In this paper, we study the applicability of a recently proposed Bayesian error estimator for the selection of the best model along the regularization path. We also propose an extension of the estimator that allows model selection in multiclass cases and study its efficiency with L-1 regularized logistic regression and L-2 regularized linear support vector machine. The model selection by the new Bayesian error estimator is experimentally shown to improve the classification accuracy, especially in small sample-size situations, and is able to avoid the excess variability inherent to traditional cross-validation approaches. Moreover, the method has significantly smaller computational complexity than cross-validation. (C) 2015 Elsevier Ltd. All rights reserved.
引用
下载
收藏
页码:3739 / 3748
页数:10
相关论文
共 50 条
  • [41] Bayesian bandwidth estimation for a semi-functional partial linear regression model with unknown error density
    Shang, Han Lin
    COMPUTATIONAL STATISTICS, 2014, 29 (3-4) : 829 - 848
  • [42] Bayesian estimation and model selection of threshold spatial Durbin model
    Zhu, Yanli
    Han, Xiaoyi
    Chen, Ying
    ECONOMICS LETTERS, 2020, 188
  • [43] Bayesian bandwidth estimation for a semi-functional partial linear regression model with unknown error density
    Han Lin Shang
    Computational Statistics, 2014, 29 : 829 - 848
  • [44] Bias in error estimation when using cross-validation for model selection
    Sudhir Varma
    Richard Simon
    BMC Bioinformatics, 7
  • [45] Bias in error estimation when using cross-validation for model selection
    Varma, S
    Simon, R
    BMC BIOINFORMATICS, 2006, 7 (1)
  • [46] Feature selection for ensembles of simple Bayesian classifiers
    Tsymbal, A
    Puuronen, S
    Patterson, D
    FOUNDATIONS OF INTELLIGENT SYSTEMS, PROCEEDINGS, 2002, 2366 : 592 - 600
  • [47] The influence of flow model selection on finite element model parameter estimation using Bayesian inference
    Hadwin, Paul J.
    Erath, Byron D.
    Peterson, Sean D.
    JASA EXPRESS LETTERS, 2021, 1 (04):
  • [48] Bayesian input selection for neural network classifiers
    Verrelst, H
    Vandewalle, J
    De Moor, B
    Timmerman, D
    PROCEEDING OF THE THIRD INTERNATIONAL CONFERENCE ON NEURAL NETWORKS AND EXPERT SYSTEMS IN MEDICINE AND HEALTHCARE, 1998, : 125 - 132
  • [49] Selection of human embryos for transfer by Bayesian classifiers
    Morales, Dinora A.
    Bengoetxea, Endika
    Larranaga, Pedro
    COMPUTERS IN BIOLOGY AND MEDICINE, 2008, 38 (11-12) : 1177 - 1186
  • [50] Objective Bayesian group variable selection for linear model
    Kang, Sang Gil
    Lee, Woo Dong
    Kim, Yongku
    COMPUTATIONAL STATISTICS, 2022, 37 (03) : 1287 - 1310