Validation of chemometric models - A tutorial

被引:233
|
作者
Westad, Frank [1 ]
Marini, Federico [2 ]
机构
[1] CAMO Software AS, N-0158 Oslo, Norway
[2] Univ Roma La Sapienza, Dept Chem, I-00185 Rome, Italy
关键词
Validation; Chemometrics; Resampling; Test set; Cross-validation; ARTIFICIAL NEURAL-NETWORKS; VARIABLE SELECTION; MULTIVARIATE CALIBRATION; SPECTROSCOPY; PREDICTION; REGRESSION; CLASSIFICATION; ALGORITHM; DESIGN; ERRORS;
D O I
10.1016/j.aca.2015.06.056
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
In this tutorial, we focus on validation both from a numerical and conceptual point of view. The often applied reported procedure in the literature of (repeatedly) dividing a dataset randomly into a calibration and test set must be applied with care. It can only be justified when there is no systematic stratification of the objects that will affect the validated estimates or figures of merits such as RMSE or R-2. The various levels of validation may, typically, be repeatability, reproducibility, and instrument and raw material variation. Examples of how one data set can be validated across this background information illustrate that it will affect the figures of merits as well as the dimensionality of the models. Even more important is the robustness of the models for predicting future samples. Another aspect that is brought to attention is validation in terms of the overall conclusions when observing a specific system. One example is to apply several methods for finding the significant variables and see if there is a consensus subset that also matches what is reported in the literature or based on the underlying chemistry. (C) 2015 Elsevier B.V. All rights reserved.
引用
收藏
页码:14 / 24
页数:11
相关论文
共 50 条
  • [31] Tutorial: Post-Silicon Validation and Diagnosis
    Basu, Kanad
    Kundu, Subhadip
    [J]. 2016 29TH INTERNATIONAL CONFERENCE ON VLSI DESIGN AND 2016 15TH INTERNATIONAL CONFERENCE ON EMBEDDED SYSTEMS (VLSID), 2016, : 9 - 10
  • [32] Shared memory consistency models: A tutorial
    Adve, SV
    Gharachorloo, K
    [J]. COMPUTER, 1996, 29 (12) : 66 - &
  • [33] Models for UWB propagation channels - A tutorial
    Molisch, AF
    [J]. DYNAMICS OF CONTINUOUS DISCRETE AND IMPULSIVE SYSTEMS-SERIES B-APPLICATIONS & ALGORITHMS, 2005, 12 (03): : 297 - 320
  • [34] Tutorial: Delay fault models and coverage
    Majhi, AK
    Agrawal, VD
    [J]. ELEVENTH INTERNATIONAL CONFERENCE ON VLSI DESIGN, PROCEEDINGS, 1997, : 364 - 369
  • [35] Comparing classification models—a practical tutorial
    W. Patrick Walters
    [J]. Journal of Computer-Aided Molecular Design, 2022, 36 : 381 - 389
  • [36] Ordinal Regression Models in Psychology: A Tutorial
    Buerkner, Paul-Christian
    Vuorre, Matti
    [J]. ADVANCES IN METHODS AND PRACTICES IN PSYCHOLOGICAL SCIENCE, 2019, 2 (01) : 77 - 101
  • [37] Tutorial on Large Language Models for Recommendation
    Hua, Wenyue
    Li, Lei
    Xu, Shuyuan
    Chen, Li
    Zhang, Yongfeng
    [J]. PROCEEDINGS OF THE 17TH ACM CONFERENCE ON RECOMMENDER SYSTEMS, RECSYS 2023, 2023, : 1281 - 1283
  • [38] Conformational analysis of lignin models: a chemometric approach
    Eduardo W. Castilho-Almeida
    Wagner B. De Almeida
    Hélio F. Dos Santos
    [J]. Journal of Molecular Modeling, 2013, 19 : 2149 - 2163
  • [39] A tutorial on calibration measurements and calibration models for clinical prediction models
    Huang, Yingxiang
    Li, Wentao
    Macheret, Fima
    Gabriel, Rodney A.
    Ohno-Machado, Lucila
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2020, 27 (04) : 621 - 633
  • [40] Conformational analysis of lignin models: a chemometric approach
    Castilho-Almeida, Eduardo W.
    De Almeida, Wagner B.
    Dos Santos, Helio F.
    [J]. JOURNAL OF MOLECULAR MODELING, 2013, 19 (05) : 2149 - 2163