Evaluation of Regression Models: Model Assessment, Model Selection and Generalization Error

被引:56
|
作者
Emmert-Streib, Frank [1 ,2 ]
Dehmer, Matthias [3 ,4 ,5 ]
机构
[1] Tampere Univ, Predict Soc & Data Analyt Lab, Fac Informat Technolgy & Commun Sci, Tampere 33100, Finland
[2] Inst Biosci & Med Technol, Tampere 33520, Finland
[3] Univ Appl Sci Upper Austria, Inst Intelligent Prod, Fac Management, Steyr Campus, A-4400 Steyr, Austria
[4] Univ Hlth Sci Med Informat & Technol, Dept Mechatron & Biomed Comp Sci, A-6060 Hall In Tirol, Austria
[5] Nankai Univ, Coll Comp & Control Engn, Tianjin 300071, Peoples R China
来源
基金
奥地利科学基金会;
关键词
machine learning; statistics; model selection; model assessment; regression models; high-dimensional data; data science; bias-variance tradeoff; generalization error; BIG DATA; SOCIAL-SCIENCE; BIAS; REGULARIZATION; PERFORMANCE; SHRINKAGE; ANALYTICS; LASSO;
D O I
10.3390/make1010032
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
When performing a regression or classification analysis, one needs to specify a statistical model. This model should avoid the overfitting and underfitting of data, and achieve a low generalization error that characterizes its prediction performance. In order to identify such a model, one needs to decide which model to select from candidate model families based on performance evaluations. In this paper, we review the theoretical framework of model selection and model assessment, including error-complexity curves, the bias-variance tradeoff, and learning curves for evaluating statistical models. We discuss criterion-based, step-wise selection procedures and resampling methods for model selection, whereas cross-validation provides the most simple and generic means for computationally estimating all required entities. To make the theoretical concepts transparent, we present worked examples for linear regression models. However, our conceptual presentation is extensible to more general models, as well as classification problems.
引用
收藏
页码:521 / 551
页数:30
相关论文
共 50 条
  • [1] Model selection in quantile regression models
    Alhamzawi, Rahim
    [J]. JOURNAL OF APPLIED STATISTICS, 2015, 42 (02) : 445 - 458
  • [2] Model Selection for Logistic Regression Models
    Duller, Christine
    [J]. NUMERICAL ANALYSIS AND APPLIED MATHEMATICS (ICNAAM 2012), VOLS A AND B, 2012, 1479 : 414 - 416
  • [3] Privacy-Preserving Evaluation of Generalization Error and Its Application to Model and Attribute Selection
    Sakuma, Jun
    Wright, Rebecca N.
    [J]. ADVANCES IN MACHINE LEARNING, PROCEEDINGS, 2009, 5828 : 338 - +
  • [4] MODEL SELECTION CRITERIA AND MODEL SELECTION TESTS IN REGRESSION-MODELS
    TERASVIRTA, T
    MELLIN, I
    [J]. SCANDINAVIAN JOURNAL OF STATISTICS, 1986, 13 (03) : 159 - 171
  • [5] Bayesian model selection for sand with generalization ability evaluation
    Jin, Yin-Fu
    Yin, Zhen-Yu
    Zhou, Wan-Huan
    Shao, Jian-Fu
    [J]. INTERNATIONAL JOURNAL FOR NUMERICAL AND ANALYTICAL METHODS IN GEOMECHANICS, 2019, 43 (14) : 2305 - 2327
  • [6] UNBIASED ESTIMATE OF GENERALIZATION ERROR AND MODEL SELECTION IN NEURAL-NETWORK
    LIU, Y
    [J]. NEURAL NETWORKS, 1995, 8 (02) : 215 - 219
  • [7] BAYESIAN ERROR ESTIMATION AND MODEL SELECTION IN SPARSE LOGISTIC REGRESSION
    Huttunen, Heikki
    Manninen, Tapio
    Tohka, Jussi
    [J]. 2013 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2013,
  • [8] Training Data Subset Selection for Regression With Controlled Generalization Error
    Sivasubramanian, Durga
    Iyer, Rishabh
    Ramakrishnan, Ganesh
    De, Abir
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [9] Regression Model for Better Generalization and Regression Analysis
    Khan, Mohiuddeen
    Srivastava, Kanishk
    [J]. ICMLSC 2020: PROCEEDINGS OF THE 4TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND SOFT COMPUTING, 2020, : 30 - 33
  • [10] Model Selection for Exponential Power Mixture Regression Models
    Jiang, Yunlu
    Liu, Jiangchuan
    Zou, Hang
    Huang, Xiaowen
    [J]. ENTROPY, 2024, 26 (05)