Validation tools for variable subset regression

被引:3
|
作者
Knut Baumann
Nikolaus Stiefl
机构
[1] University of Wuerzburg,Department of Pharmacy and Food Chemistry
关键词
chance correlation; cross-validation; validation; variable selection;
D O I
暂无
中图分类号
学科分类号
摘要
Variable selection is applied frequently in QSAR research. Since the selection process influences the characteristics of the finally chosen model, thorough validation of the selection technique is very important. Here, a validation protocol is presented briefly and two of the tools which are part of this protocol are introduced in more detail. The first tool, which is based on permutation testing, allows to assess the inflation of internal figures of merit (such as the cross-validated prediction error). The other tool, based on noise addition, can be used to determine the complexity and with it the stability of models generated by variable selection. The obtained statistical information is important in deciding whether or not to trust the predictive abilities of a specific model. The graphical output of the validation tools is easily accessible and provides a reliable impression of model performance. Among others, the tools were employed to study the influence of leave-one-out and leave-multiple-out cross-validation on model characteristics. Here, it was confirmed that leave-multiple-out cross-validation yields more stable models. To study the performance of the entire validation protocol, it was applied to eight different QSAR data sets with default settings. In all cases internal and external model performance was good, indicating that the protocol serves its purpose quite well.
引用
收藏
页码:549 / 562
页数:13
相关论文
共 50 条
  • [1] Validation tools for variable subset regression
    Baumann, K
    Stiefl, N
    [J]. JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 2004, 18 (7-9) : 549 - 562
  • [2] Variable and subset selection in PLS regression
    Höskuldsson, A
    [J]. CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2001, 55 (1-2) : 23 - 38
  • [3] lmSubsets: Exact Variable-Subset Selection in Linear Regression for R
    Hofmann, Marc
    Gatu, Cristian
    Kontoghiorghes, Erricos J.
    Colubi, Ana
    Zeileis, Achim
    [J]. JOURNAL OF STATISTICAL SOFTWARE, 2020, 93 (03):
  • [4] A mathematical programming approach for integrated multiple linear regression subset selection and validation
    Chung, Seokhyun
    Park, Young Woong
    Cheong, Taesu
    [J]. PATTERN RECOGNITION, 2020, 108 (108)
  • [5] CHOOSING A SUBSET REGRESSION
    MALLOWS, CL
    [J]. TECHNOMETRICS, 1967, 9 (01) : 190 - &
  • [6] Better subset regression
    Xiong, Shifeng
    [J]. BIOMETRIKA, 2014, 101 (01) : 71 - 84
  • [7] Experimental validation of chatter stability for variable helix milling tools
    Yusoff, Ahmad R.
    Sims, Neil D.
    Turner, Sam
    [J]. TRENDS IN AEROSPACE MANUFACTURING 2009 INTERNATIONAL CONFERENCE, 2011, 26
  • [8] Variable selection in linear regression models: Choosing the best subset is not always the best choice
    Hanke, Moritz
    Dijkstra, Louis
    Foraita, Ronja
    Didelez, Vanessa
    [J]. BIOMETRICAL JOURNAL, 2023,
  • [9] Variable selection in linear regression models: Choosing the best subset is not always the best choice
    Hanke, Moritz
    Dijkstra, Louis
    Foraita, Ronja
    Didelez, Vanessa
    [J]. BIOMETRICAL JOURNAL, 2024, 66 (01)
  • [10] A reconstructed variable regression method for thermal error modeling of machine tools
    Li, Yang
    Zhao, Ji
    Ji, Shijun
    [J]. INTERNATIONAL JOURNAL OF ADVANCED MANUFACTURING TECHNOLOGY, 2017, 90 (9-12): : 3673 - 3684