Evaluation of the principal-component and expectation-maximization methods for estimating missing data in morphometric studies

被引:0
|
作者
Strauss, RE [1 ]
Atanassov, MN
De Oliveira, JA
机构
[1] Texas Tech Univ, Dept Sci Biol, Lubbock, TX 79409 USA
[2] Texas Tech Univ, Dept Geosci, Lubbock, TX 79409 USA
[3] Univ Fed Rio de Janeiro, Museu Nacl, Dept Vertebrados, Rio de Janeiro, Brazil
关键词
D O I
10.1671/0272-4634(2003)023[0284:EOTPAE]2.0.CO;2
中图分类号
Q91 [古生物学];
学科分类号
0709 ; 070903 ;
摘要
Vertebrate skeletons, particularly fossils, commonly have damaged, distorted, or missing structures. Because multivariate morphometric methods require complete data matrices, there are two possible solutions: to omit the specimens or characters having missing values, or to estimate missing values from the remainder of the data. Omission of specimens or characters reduces the data available for analysis, and thus the power to detect patterns or differences. Univariate and bivariate-regression methods are known to reduce the total variance of the data, and thus are not considered here. We compared the two most common multivariate methods: expectation-maximization (EM), which uses the covariance matrix directly, and principal-component (PC) estimation, based on regression of characters on principal components. Performance was evaluated by computer simulation of randomly introduced missing data in constructed data sets of known structure, and in several complete fossil (Pterodactylus skeleton) and recent (Alligator skeleton, Canis skull) data sets. The EM and PC methods displayed consistent and similar patterns of behavior for varying combinations of specimens and characters and across a broad range of amounts of missing data. Reliability was greatest for moderate numbers of characters (6-12) and larger sample sizes. For fewer characters the maximum amount of missing data that can be predicted increases substantially, but with a decrease in reliability. Both methods produce accurate estimates of missing values, but EM estimates are more precise. EM also outperforms the PC method in the maximum proportion of missing values that can be reliably estimated (almost 50% for small numbers of characters).
引用
收藏
页码:284 / 296
页数:13
相关论文
共 44 条