Testing of the Effect of Missing Data Estimation and Distribution in Morphometric Multivariate Data Analyses

被引:62
|
作者
Brown, Caleb Marshall [1 ,2 ]
Arbour, Jessica H. [2 ]
Jackson, Donald A. [2 ]
机构
[1] Royal Ontario Museum, Dept Nat Hist Palaeobiol, Toronto, ON M5S 2C6, Canada
[2] Univ Toronto, Dept Ecol & Evolutionary Biol, Toronto, ON M5S 3B2, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Crocodilia; deformation; fossil; incomplete; morphology; ordination; PCA; Procrustes; shape; taxonomy; RELATIVE GROWTH; PHYLOGENY;
D O I
10.1093/sysbio/sys047
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Missing data are an unavoidable problem in biological data sets and the performance of missing data deletion and estimation techniques in morphometric data sets is poorly understood. Here, a novel method is used to measure the introduced error of multiple techniques on a representative sample. A large sample of extant crocodilian skulls was measured and analyzed with principal component analysis (PCA). Twenty-three different proportions of missing data were introduced into the data set, estimated, analyzed, and compared with the original result using Procrustes superimposition. Previous work investigating the effects of missing data input missing values randomly, a non-biological phenomenon. Here, missing data were introduced into the data set using three methodologies: purely at random, as a function of the Euclidean distance between respective measurements (simulating anatomical regions), and as a function of the portion of the sample occupied by each taxon (simulating unequal missing data in rare taxa). Gower's distance was found to be the best performing non-estimation method, and Bayesian PCA the best performing estimation method. Specimens of the taxa with small sample sizes and those most morphologically disparate had the highest estimation error. Distribution of missing data had a significant effect on the estimation error for almost all methods and proportions. Taxonomically biased missing data tended to show similar trends to random, but with higher error rates. Anatomically biased missing data showed a much greater deviation from random than the taxonomic bias, and with magnitudes dependent on the estimation method.
引用
收藏
页码:941 / 954
页数:14
相关论文
共 50 条
  • [31] Impact of Missing Data on Parameter Estimation Algorithm of Normal Distribution
    Wang Feng
    Wang Shaotong
    [J]. 2013 2ND INTERNATIONAL SYMPOSIUM ON INSTRUMENTATION AND MEASUREMENT, SENSOR NETWORK AND AUTOMATION (IMSNA), 2013, : 574 - 578
  • [32] HYPOTHESIS-TESTING IN MULTIVARIATE LINEAR-MODELS WITH RANDOMLY MISSING DATA
    BARTON, CN
    CRAMER, EC
    [J]. COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 1989, 18 (03) : 875 - 895
  • [33] GENERALIZED LEAST-SQUARES ESTIMATION OF MULTIVARIATE NONLINEAR MODELS WITH MISSING DATA
    SIEPMAN, HR
    YANG, SS
    [J]. COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 1994, 23 (06) : 1565 - 1579
  • [34] MISSING DATA IN THE ONE-POPULATION MULTIVARIATE NORMAL PATTERNED MEAN AND COVARIANCE-MATRIX TESTING AND ESTIMATION PROBLEM
    SZATROWSKI, TH
    [J]. ANNALS OF STATISTICS, 1983, 11 (03): : 947 - 958
  • [35] Latent Gaussian Processes for Distribution Estimation of Multivariate Categorical Data
    Gal, Yarin
    Chen, Yutian
    Ghahramani, Zoubin
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 37, 2015, 37 : 645 - 654
  • [36] Kernel Density Estimation with Missing Data: Misspecifying the Missing Data Mechanism
    Dubnicka, Suzanne R.
    [J]. NONPARAMETRIC STATISTICS AND MIXTURE MODELS: A FESTSCHRIFT IN HONOR OF THOMAS P HETTMANSPERGER, 2011, : 114 - 135
  • [37] Estimation of Conditional Prevalence From Group Testing Data With Missing Covariates
    Delaigle, Aurore
    Huang, Wei
    Lei, Shaoke
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2020, 115 (529) : 467 - 480
  • [38] THE ESTIMATION OF MISSING CLIMATOLOGICAL DATA
    TABONY, RC
    [J]. JOURNAL OF CLIMATOLOGY, 1983, 3 (03): : 297 - 314
  • [39] REGRESSION ESTIMATION OF MISSING DATA
    OGRADY, KE
    [J]. BEHAVIOR RESEARCH METHODS & INSTRUMENTATION, 1982, 14 (03): : 359 - 360
  • [40] Possibilistic Missing Data Estimation
    Dahabiah, Anas
    Puentes, John
    Solaiman, Basel
    [J]. PROCEEDINGS OF THE 9TH WSEAS INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, KNOWLEDGE ENGINEERING AND DATA BASES, 2010, : 173 - +