Validation tools for variable subset regression

被引：3

作者：

Knut Baumann

Nikolaus Stiefl

机构：

[1] University of Wuerzburg,Department of Pharmacy and Food Chemistry

来源：

Journal of Computer-Aided Molecular Design | 2004年 / 18卷

关键词：

chance correlation; cross-validation; validation; variable selection;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Variable selection is applied frequently in QSAR research. Since the selection process influences the characteristics of the finally chosen model, thorough validation of the selection technique is very important. Here, a validation protocol is presented briefly and two of the tools which are part of this protocol are introduced in more detail. The first tool, which is based on permutation testing, allows to assess the inflation of internal figures of merit (such as the cross-validated prediction error). The other tool, based on noise addition, can be used to determine the complexity and with it the stability of models generated by variable selection. The obtained statistical information is important in deciding whether or not to trust the predictive abilities of a specific model. The graphical output of the validation tools is easily accessible and provides a reliable impression of model performance. Among others, the tools were employed to study the influence of leave-one-out and leave-multiple-out cross-validation on model characteristics. Here, it was confirmed that leave-multiple-out cross-validation yields more stable models. To study the performance of the entire validation protocol, it was applied to eight different QSAR data sets with default settings. In all cases internal and external model performance was good, indicating that the protocol serves its purpose quite well.

引用

页码：549 / 562

页数：13

共 50 条

[31] Computation of determinantal subset influence in regression
Barrett, BE
Gray, JB
STATISTICS AND COMPUTING, 1996, 6 (02) : 131 - 138
[32] PRESS-RELATED STATISTICS - REGRESSION TOOLS FOR CROSS-VALIDATION AND CASE DIAGNOSTICS
HOLIDAY, DB
BALLARD, JE
MCKEOWN, BC
MEDICINE AND SCIENCE IN SPORTS AND EXERCISE, 1995, 27 (04): : 612 - 620
[33] A method for calibration and validation subset partitioning
Galvao, RKH
Araujo, MCU
José, GE
Pontes, MJC
Silva, EC
Saldanha, TCB
TALANTA, 2005, 67 (04) : 736 - 740
[34] Common subset selection of inputs in multiresponse regression
Simila, Timo
Tikka, Jarkko
2006 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORK PROCEEDINGS, VOLS 1-10, 2006, : 1908 - +
[35] An adaptive test of significance for a subset of regression coefficients
O'Gorman, TW
STATISTICS IN MEDICINE, 2002, 21 (22) : 3527 - 3542
[36] BETTER SUBSET REGRESSION USING THE NONNEGATIVE GARROTE
BREIMAN, L
TECHNOMETRICS, 1995, 37 (04) : 373 - 384
[37] On testing a subset of regression parameters under heteroskedasticity
Wen, Miin-Jye
Chen, Shun-Yi
Chen, Hubert J.
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2007, 51 (12) : 5958 - 5976
[38] ON SUBSET-SELECTION IN NONPARAMETRIC STOCHASTIC REGRESSION
YAO, QW
TONG, H
STATISTICA SINICA, 1994, 4 (01) : 51 - 70
[39] SUBSET-SELECTION IN REGRESSION - MILLER,AJ
MAYEKAWA, S
JOURNAL OF EDUCATIONAL STATISTICS, 1992, 17 (04): : 375 - 377
[40] Training Subset Selection for Support Vector Regression
Liu, Cenru
Cen, Jiahao
PROCEEDINGS OF THE 2019 FEDERATED CONFERENCE ON COMPUTER SCIENCE AND INFORMATION SYSTEMS (FEDCSIS), 2019, : 11 - 14

← 1 2 3 4 5 →