Principal component analysis of incomplete data ? A simple solution to an old problem

被引:17
|
作者
Podani, Janos [1 ]
Kalapos, Tibor [1 ]
Barta, Barbara [2 ]
Schmera, Denes [2 ]
机构
[1] Eotvos Lorand Univ, Inst Biol, Dept Plant Systemat Ecol & Theoret Biol, Pazmany Ps 1-C, H-1117 Budapest, Hungary
[2] Balaton Limnol Inst, Ctr Ecol Res, Klebelsberg Ku 3, H-8237 Tihany, Hungary
关键词
Biplot; Correlation; Functional trait; Missing data; Morphometry; Ordination;
D O I
10.1016/j.ecoinf.2021.101235
中图分类号
Q14 [生态学(生物生态学)];
学科分类号
071012 ; 0713 ;
摘要
A long-standing problem in biological data analysis is the unintentional absence of values for some observations or variables, preventing the use of standard multivariate exploratory methods, such as principal component analysis (PCA). Solutions include deleting parts of the data by which information is lost, data imputation, which is always arbitrary, and restriction of the analysis to either the variables or observations, thereby losing the advantages of biplot diagrams. We describe a minor modification of eigenanalysis-based PCA in which correlations or covariances are calculated using different numbers of observations for each pair of variables, and the resulting eigenvalues and eigenvectors are used to calculate component scores such that missing values are skipped. This procedure avoids artificial data imputation, exhausts all information from the data and allows the preparation of biplots for the simultaneous display of the ordination of variables and observations. The use of the modified PCA, called InDaPCA (PCA of Incomplete Data) is demonstrated on actual biological examples: leaf functional traits of plants, functional traits of invertebrates, cranial morphometry of crocodiles and fish hybridization data ? with biologically meaningful results. Our study suggests that it is not the percentage of missing entries in the data matrix that matters; the success of InDaPCA is mostly affected by the minimum number of observations available for comparing a given pair of variables. In the present study, interpretation of results in the space of the first two components was not hindered, however.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Streaming principal component analysis from incomplete data
    Eftekhari, Armin
    Ongie, Gregory
    Balzano, Laura
    Wakin, Michael B.
    [J]. Journal of Machine Learning Research, 2019, 20
  • [2] Streaming Principal Component Analysis From Incomplete Data
    Eftekhari, Armin
    Ongie, Gregory
    Balzano, Laura
    Wakin, Michael B.
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2019, 20
  • [3] Incomplete robust principal component analysis
    Shi, Jiarong
    Zheng, Xiuyun
    Yong, Longquan
    [J]. ICIC Express Letters, Part B: Applications, 2014, 5 (06): : 1531 - 1538
  • [4] Functional principal component analysis for incomplete space-time data
    Palummo, Alessandro
    Arnone, Eleonora
    Formaggia, Luca
    Sangalli, Laura M.
    [J]. ENVIRONMENTAL AND ECOLOGICAL STATISTICS, 2024, 31 (02) : 555 - 582
  • [5] A practical solution based on the principal component analysis to the indecision problem in determination of inputs and outputs in the data envelopment analysis
    Yildirim, I. Esen
    [J]. ISTANBUL UNIVERSITY JOURNAL OF THE SCHOOL OF BUSINESS, 2010, 39 (01): : 141 - 153
  • [6] Applications of maximum likelihood principal component analysis: incomplete data sets and calibration transfer
    Andrews, DT
    Wentzell, PD
    [J]. ANALYTICA CHIMICA ACTA, 1997, 350 (03) : 341 - 352
  • [7] The effects of EEG data transformations on the solution accuracy of principal component analysis
    Arruda, James E.
    McGee, Heather A.
    Zhang, Hongmei
    Stanny, Claudia J.
    [J]. PSYCHOPHYSIOLOGY, 2011, 48 (03) : 370 - 376
  • [8] A WEIGHTING PROBLEM IN PRINCIPAL COMPONENT ANALYSIS
    KOPP, B
    [J]. PSYCHOLOGISCHE BEITRAGE, 1981, 23 (02): : 218 - 225
  • [9] A Simple Solution to a Very Old Problem
    [J]. Electr. J., 2006, 4 (56-59):
  • [10] ON THE SOLUTION OF THE IMAGE RECOGNITION PROBLEM BY A PRINCIPAL COMPONENT METHOD AND LINEAR DISCRIMINANT ANALYSIS
    Mokeyev, V. V.
    Tomilov, S. V.
    [J]. COMPUTER OPTICS, 2014, 38 (04) : 871 - 880