Dynamic determination of the dimension of PCA calibration models using F-statistics

被引:18
|
作者
Vogt, F [1 ]
Mizaikoff, B [1 ]
机构
[1] Georgia Inst Technol, Sch Chem & Biochem, Atlanta, GA 30332 USA
关键词
principal component analysis/regression (PCA/PCR); dimension of calibration models; dynamic PCA model adjustment; F-statistics; optical spectra;
D O I
10.1002/cem.813
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Owing to experimental measurement errors, determination of the proper dimension of calibration models is difficult. Cross-validation is a common approach for this purpose; however, if data evaluation is based on PCA only without consideration of sample concentrations, this computationally expensive method cannot be applied. In this study a statistical method for determining the proper dimension of PCA calibration models is presented from the viewpoint of multivariate regression analysis considering only measured data. For this iterative algorithm, individual principal components are included stepwise in a reduced model, which is subsequently tested against the full model including all principal components. This algorithm can be individually applied for optimized data evaluation to every measured data vector such as an optical spectrum of chemical analyte. This comparison is performed by an F-test comparing estimates of residual variance of a measurement spectrum determined from the reduced and the full model. This approach determines a lack of fit due to insufficient principal components. If no lack of fit is evident for a certain reduced model, it is considered that a sufficiently large model has been found and inclusion of additional principal components is stopped. Hence the resulting reduced calibration model includes only statistically significant principal components (PCs) and determines the minimum number of required PCs for a given measurement spectrum. The proposed algorithm is initially investigated using simulated data and subsequently applied to three different experimental sets of spectra. It is shown that for synthetic data at reasonable noise levels the correct number of principal components can be determined in most cases. The experimental examples demonstrate that the number of principal components determined by the proposed algorithm is slightly larger than a user would select manually by subjective visual inspection. As one result, the algorithm is able to detect small but significant spectroscopic features of experimental data which would otherwise be neglected. Copyright (C) 2003 John Wiley Sons, Ltd.
引用
收藏
页码:346 / 357
页数:12
相关论文
共 50 条
  • [1] On the limits of fitting complex models of population history to f-statistics
    Maier, Robert
    Flegontov, Pavel
    Flegontova, Olga
    Isildak, Ulas
    Changmai, Piya
    Reich, David
    ELIFE, 2023, 12
  • [2] Using conventional F-statistics to study unconventional sex-chromosome differentiation
    Rodrigues, Nicolas
    Dufresnes, Christophe
    PEERJ, 2017, 5
  • [3] STATISTICAL INFERENCE WITH F-STATISTICS WHEN FITTING SIMPLE MODELS TO HIGH-DIMENSIONAL DATA
    Leeb, Hannes
    Steinberger, Lukas
    ECONOMETRIC THEORY, 2023, 39 (06) : 1249 - 1272
  • [4] Study on dynamic clustering analysis method for gene expression data based on multidimension pseudo F-statistics
    School of Computer and Communication, Hunan University, Changsha 410082, China
    不详
    Xitong Fangzhen Xuebao, 2006, 3 (586-589+601):
  • [5] f-Statistics estimation and admixture graph construction with Pool-Seq or allele count data using the R package poolfstat
    Gautier, Mathieu
    Vitalis, Renaud
    Flori, Laurence
    Estoup, Arnaud
    MOLECULAR ECOLOGY RESOURCES, 2022, 22 (04) : 1394 - 1416
  • [6] On-line batch process monitoring using dynamic PCA and dynamic PLS models
    Chen, JH
    Liu, KC
    CHEMICAL ENGINEERING SCIENCE, 2002, 57 (01) : 63 - 75
  • [7] Modeling of African population history using f-statistics is biased when applying all previously proposed SNP ascertainment schemes
    Flegontov, Pavel
    Isildak, Ulas
    Maier, Robert
    Yuncu, Eren
    Changmai, Piya
    Reich, David
    PLOS GENETICS, 2023, 19 (09):
  • [8] Dynamic calibration of agent-based models using data assimilation
    Ward, Jonathan A.
    Evans, Andrew J.
    Malleson, Nicolas S.
    ROYAL SOCIETY OPEN SCIENCE, 2016, 3 (04):
  • [9] Identification of errors-in-variables ARX models using modified dynamic iterative PCA
    Maurya, Deepak
    Tangirala, Arun K.
    Narasimhan, Shankar
    JOURNAL OF THE FRANKLIN INSTITUTE-ENGINEERING AND APPLIED MATHEMATICS, 2022, 359 (13): : 7069 - 7090
  • [10] TECHNIQUE FOR DETERMINATION OF SURFACE FRACTAL DIMENSION USING A DYNAMIC FLOW ADSORPTION INSTRUMENT
    LUDLOW, DK
    MOBERG, TP
    ANALYTICAL INSTRUMENTATION, 1990, 19 (2-3): : 113 - 123