Validation tools for variable subset regression

被引:3
|
作者
Knut Baumann
Nikolaus Stiefl
机构
[1] University of Wuerzburg,Department of Pharmacy and Food Chemistry
关键词
chance correlation; cross-validation; validation; variable selection;
D O I
暂无
中图分类号
学科分类号
摘要
Variable selection is applied frequently in QSAR research. Since the selection process influences the characteristics of the finally chosen model, thorough validation of the selection technique is very important. Here, a validation protocol is presented briefly and two of the tools which are part of this protocol are introduced in more detail. The first tool, which is based on permutation testing, allows to assess the inflation of internal figures of merit (such as the cross-validated prediction error). The other tool, based on noise addition, can be used to determine the complexity and with it the stability of models generated by variable selection. The obtained statistical information is important in deciding whether or not to trust the predictive abilities of a specific model. The graphical output of the validation tools is easily accessible and provides a reliable impression of model performance. Among others, the tools were employed to study the influence of leave-one-out and leave-multiple-out cross-validation on model characteristics. Here, it was confirmed that leave-multiple-out cross-validation yields more stable models. To study the performance of the entire validation protocol, it was applied to eight different QSAR data sets with default settings. In all cases internal and external model performance was good, indicating that the protocol serves its purpose quite well.
引用
下载
收藏
页码:549 / 562
页数:13
相关论文
共 50 条
  • [41] SUBSET-SELECTION IN REGRESSION - MILLER,AJ
    HALDAR, S
    JOURNAL OF MARKETING RESEARCH, 1992, 29 (02) : 270 - 272
  • [42] Tools for verification and validation
    2005, Springer Verlag, Heidelberg, D-69121, Germany (3436 LNCS):
  • [43] Variable importance in latent variable regression models
    Kvalheim, Olav M.
    Arneberg, Reidar
    Bleie, Olav
    Rajalahti, Tarja
    Smilde, Age K.
    Westerhuis, Johan A.
    JOURNAL OF CHEMOMETRICS, 2014, 28 (08) : 615 - 622
  • [44] Iteratively variable subset optimization for multivariate calibration
    Wang, Weiting
    Yun, Yonghuan
    Deng, Baichuan
    Fan, Wei
    Liang, Yizeng
    RSC ADVANCES, 2015, 5 (116) : 95771 - 95780
  • [45] Variable selection with stepwise and best subset approaches
    Zhang, Zhongheng
    ANNALS OF TRANSLATIONAL MEDICINE, 2016, 4 (07)
  • [46] Embedding Boosted Regression Trees approach to variable selection and cross-validation in parametric regression to predict diameter distribution after thinning
    Lin, Ho-Tung
    Lam, Tzeng Yih
    Peng, Ping-Hsun
    Chiu, Chih-Ming
    FOREST ECOLOGY AND MANAGEMENT, 2021, 499 (499)
  • [47] Estimating LAD regression coefficients with best subset points
    Choi, Hyun Jip
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2008, 37 (09) : 1799 - 1809
  • [48] Conditional Uncorrelation and Efficient Subset Selection in Sparse Regression
    Wang, Jianji
    Zhang, Shupei
    Liu, Qi
    Du, Shaoyi
    Guo, Yu-Cheng
    Zheng, Nanning
    Wang, Fei-Yue
    IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (10) : 10458 - 10467
  • [49] INFLATION OF R2 IN BEST SUBSET REGRESSION
    RENCHER, AC
    PUN, FC
    TECHNOMETRICS, 1980, 22 (01) : 49 - 53
  • [50] Complete subset least squares support vector regression
    Qiu, Yue
    ECONOMICS LETTERS, 2021, 200