Analysis of multilocus fingerprinting data sets containing missing data

被引:357
|
作者
Schlueter, Philipp M.
Harris, Stephen A.
机构
[1] Univ Vienna, Inst Bot, Dept Systemat & Evolutionary Bot, A-1030 Vienna, Austria
[2] Univ Oxford, Dept Plant Sci, Oxford OX1 3RB, England
来源
MOLECULAR ECOLOGY NOTES | 2006年 / 6卷 / 02期
关键词
DNA fingerprinting; dominant markers; Jaccard's similarity coefficient; missing data; Shannon's diversity index;
D O I
10.1111/j.1471-8286.2006.01225.x
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Missing data are commonly encountered using multilocus, fragment-based (dominant) fingerprinting methods, such as random amplified polymorphic DNA (RAPD) or amplified fragment length polymorphism (AFLP). Data sets containing missing data have been analysed by eliminating those bands or samples with missing data, assigning values to missing data or ignoring the problem. Here, we present a method that uses random assignments of band presence-absence to the missing data, implemented by the computer program FAMD (available from http://homepage.univie.ac.at/philipp.maria.schlueter/famd.html), for analyses based on pairwise similarity and Shannon's index. When missing values group in a data set, sample or band elimination is likely to be the most appropriate action. However, when missing values are scattered across the data set, minimum, maximum and average similarity coefficients are a simple means of visualizing the effects of missing data on tree structure. Our approach indicates the range of values that a data set containing missing data points might generate, and forces the investigator to consider the effects of missing values on data interpretation.
引用
收藏
页码:569 / 572
页数:4
相关论文
共 50 条
  • [41] Multiple imputation of missing data for survey data analysis
    Lupo, Coralie
    Le Bouquin, Sophie
    Michel, Virginie
    Colin, Pierre
    Chauvin, Claire
    EPIDEMIOLOGIE ET SANTE ANIMALE, 2008, NO 53, 2008, (53): : 73 - 83
  • [42] Missing phenotype data imputation in pedigree data analysis
    Fridley, Brooke L.
    de Andrade, Mariza
    GENETIC EPIDEMIOLOGY, 2008, 32 (01) : 52 - 60
  • [43] Cyclical hybrid imputation technique for missing values in data sets
    Kotan, Kurban
    Kirisoglu, Serdar
    SCIENTIFIC REPORTS, 2025, 15 (01):
  • [44] Methods for imputation of missing values in air quality data sets
    Junninen, H
    Niska, H
    Tuppurainen, K
    Ruuskanen, J
    Kolehmainen, M
    ATMOSPHERIC ENVIRONMENT, 2004, 38 (18) : 2895 - 2907
  • [45] Dealing with Missing Data using a Selection Algorithm on Rough Sets
    Jonathan Prieto-Cubides
    Camilo Argoty
    International Journal of Computational Intelligence Systems, 2018, 11 : 1307 - 1321
  • [46] Chemometric treatment of missing elements in air quality data sets
    Smolinski, A.
    Hlawiczka, S.
    POLISH JOURNAL OF ENVIRONMENTAL STUDIES, 2007, 16 (04): : 613 - 622
  • [47] Handling missing attribute values in preterm birth data sets
    Grzymala-Busse, JW
    Goodwin, LK
    Grzymala-Busse, WJ
    Zheng, XQ
    ROUGH SETS, FUZZY SETS, DATA MINING, AND GRANULAR COMPUTING, PT 2, PROCEEDINGS, 2005, 3642 : 342 - 351
  • [48] RETRIEVING THE MISSING DATA FROM DIFFERENT INCOMPLETE SOFT SETS
    Srivastava, Julee
    Maddheshiya, Sudhir
    3C EMPRESA, 2022, 11 (02): : 104 - 114
  • [49] Robust test statistics for data sets with missing correlation information
    Koch, Lukas
    PHYSICAL REVIEW D, 2021, 103 (11)
  • [50] Integrative Data Analysis: The Simultaneous Analysis of Multiple Data Sets
    Curran, Patrick J.
    Hussong, Andrea M.
    PSYCHOLOGICAL METHODS, 2009, 14 (02) : 81 - 100