Robust Cross-Validation of Linear Regression QSAR Models

被引:49
|
作者
Konovalov, Dmitry A. [1 ]
Llewellyn, Lyndon E. [2 ]
Heyden, Yvan Vander [3 ]
Coomans, Danny [1 ]
机构
[1] James Cook Univ, Sch Math Phys & Informat Technol, Townsville, Qld 4811, Australia
[2] Australian Inst Marine Sci, PMB 3, Townsville, Qld 4810, Australia
[3] Vrije Univ Brussel, Dept Analyt Chem & Pharmaceut Technol, Inst Pharmaceut, B-1050 Brussels, Belgium
关键词
D O I
10.1021/ci800209k
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
A quantitative structure - activity relationship (QSAR) model is typically developed to predict the biochemical activity of untested compounds from the compounds' molecular structures. "The gold standard" of model validation is the blindfold prediction when the model's predictive power is assessed from how well the model predicts the activity values of compounds that were not considered in any way during the model development/calibration. However, during the development of a QSAR model, it is necessary to obtain some indication of the model's predictive power. This is often done by some form of cross-validation (CV). In this study, the concepts of the predictive power and fitting ability of a multiple linear regression (MLR) QSAR model were examined in the CV context allowing for the presence of outliers. Commonly used predictive power and fitting ability statistics were assessed via Monte Carlo cross-validation when applied to percent human intestinal absorption, blood-brain partition coefficient, and toxicity values of saxitoxin QSAR data sets, as well as three known benchmark data sets with known outlier contamination. It was found that (1) a robust version of MLR should always be preferred over the ordinary-least-squares MLR, regardless of the degree of outlier contamination and that (2) the model's predictive power should only be assessed via robust statistics. The Matlab and java source code used in this study is freely available from the QSAR-BENCH section of www.dmitrykonovalov.org for academic use. The Web site also contains the java-based QSAR-BENCH program, which could be run online via java's Web Start technology (supporting Windows, Mac OSX, Linux/Unix) to reproduce most of the reported results or apply the reported procedures to other data sets.
引用
收藏
页码:2081 / 2094
页数:14
相关论文
共 50 条
  • [31] ASYMPTOTIC OPTIMALITY OF GENERALIZED CL, CROSS-VALIDATION, AND GENERALIZED CROSS-VALIDATION IN REGRESSION WITH HETEROSKEDASTIC ERRORS
    ANDREWS, DWK
    [J]. JOURNAL OF ECONOMETRICS, 1991, 47 (2-3) : 359 - 377
  • [32] Constructive cross-validation in linear prediction
    Spitzner, Dan J.
    [J]. COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2007, 36 (05) : 939 - 953
  • [33] Linear model selection by cross-validation
    Rao, CR
    Wu, Y
    [J]. JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2005, 128 (01) : 231 - 240
  • [34] Robust cross-validation score function for non-linear function estimation
    De Brabanter, J
    Pelckmans, K
    Suykens, JAK
    Vandewalle, J
    [J]. ARTIFICIAL NEURAL NETWORKS - ICANN 2002, 2002, 2415 : 713 - 719
  • [35] Criterion for Evaluating the Predictive Ability of Nonlinear Regression Models without Cross-Validation
    Kaneko, Hiromasa
    Funatsu, Kimito
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2013, 53 (09) : 2341 - 2348
  • [36] Metabolizable energy in energy food for growing pigs and cross-validation regression models
    Escocard de Oliveira, Newton Tavares
    Pozza, Paulo Cesar
    Castilha, Leandro Dalcin
    Pasquetti, Tiago Junior
    Langer, Carolina Natali
    [J]. REVISTA CIENCIA AGRONOMICA, 2018, 49 (01): : 150 - 158
  • [37] Bootstrap and cross-validation to assess complexity of data-driven regression models
    Sauerbrei, W
    Schumacher, M
    [J]. MEDICAL DATA ANALYSIS, PROCEEDINGS, 2000, 1933 : 234 - 241
  • [38] Cross-validation approaches for penalized Cox regression
    Dai, Biyue
    Breheny, Patrick
    [J]. STATISTICAL METHODS IN MEDICAL RESEARCH, 2024, 33 (04) : 702 - 715
  • [39] ON THE CONSISTENCY OF CROSS-VALIDATION IN KERNEL NONPARAMETRIC REGRESSION
    WONG, WH
    [J]. ANNALS OF STATISTICS, 1983, 11 (04): : 1136 - 1141
  • [40] Cross-validation: What is it and how is it used in regression?
    Morin, Kristi
    Davis, John L.
    [J]. COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2017, 46 (11) : 5238 - 5251