Analysis of data consistency identifies measurement abnormality in Howells' craniometric test data set

被引:1
|
作者
Pang, Jinyong [1 ,2 ]
Dong, Yibo [1 ,2 ,4 ]
Turner, Christopher [3 ]
Li, Chang [1 ,2 ]
Liu, Xiaoming [1 ,2 ]
机构
[1] Univ S Florida, USF Genom, 3720 Spectrum Blvd,Suite 304, Tampa, FL 33612 USA
[2] Univ S Florida, Coll Publ Hlth, 3720 Spectrum Blvd,Suite 304, Tampa, FL 33612 USA
[3] Univ S Florida, Coll Arts & Sci, Dept Anthropol, Tampa, FL 33612 USA
[4] Bur Publ Hlth Labs, 1217 N Pearl St, Jacksonville, FL USA
来源
关键词
data contency; SIS; Howells' craniometric data; simotic chord; simotic subtense; sis; WNB;
D O I
10.1002/ajpa.24631
中图分类号
Q98 [人类学];
学科分类号
030303 ;
摘要
Howells' craniometric data set is the largest publicly available craniometric data set on the internet and has been widely used in craniometric methods development. The data consists of a main data set of 2524 human crania from 28 populations and an additional "test" data set of 524 crania. Up to 82 measurements were recorded from those crania. We studied the data consistency between the main and test data sets for potential combined usage of the two. We found that the two data sets can be separated clearly via Uniform Manifold Approximation and Projection, suggesting some data inconsistency between the two. To further investigate the cause, we split the two data sets into six continental groups (African, Austro-Melanesian, East Asian, European, Native American, and Polynesian) and tested the distribution difference between the two data sets for each of the groups. We found that the measures of simotic chord (WNB) and simotic subtense (SIS) are significantly and abnormally larger in the test data set than in the main data set. After removing the two measures, the two data sets are broadly comparable. We further showed the evidence that missing decimal points likely caused the abnormality.
引用
收藏
页码:687 / 692
页数:6
相关论文
共 50 条