Impact of imputation methods on the amount of genetic variation captured by a single-nucleotide polymorphism panel in soybeans

被引:24
|
作者
Xavier, A. [1 ]
Muir, William M. [2 ]
Rainey, Katy M. [1 ]
机构
[1] Purdue Univ, Dept Agron, Lilly Hall Life Sci,915 W State St, W Lafayette, IN 47907 USA
[2] Purdue Univ, Dept Anim Sci, Lilly Hall Life Sci,915 W State St, W Lafayette, IN 47907 USA
来源
BMC BIOINFORMATICS | 2016年 / 17卷
关键词
Empirical Bayes; Heritability; Genomic selection; Association studies; WHOLE-GENOME REGRESSION; INCREASES POWER; GENOTYPE DATA; PREDICTION; SELECTION; ACCURACY; MARKERS; MODEL; PLANT; ASSOCIATION;
D O I
10.1186/s12859-016-0899-7
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Success in genome-wide association studies and marker-assisted selection depends on good phenotypic and genotypic data. The more complete this data is, the more powerful will be the results of analysis. Nevertheless, there are next-generation technologies that seek to provide genotypic information in spite of great proportions of missing data. The procedures these technologies use to impute genetic data, therefore, greatly affect downstream analyses. This study aims to (1) compare the genetic variance in a single-nucleotide polymorphism panel of soybean with missing data imputed using various methods, (2) evaluate the imputation accuracy and post-imputation quality associated with these methods, and (3) evaluate the impact of imputation method on heritability and the accuracy of genome-wide prediction of soybean traits. The imputation methods we evaluated were as follows: multivariate mixed model, hidden Markov model, logical algorithm, k-nearest neighbor, single value decomposition, and random forest. We used raw genotypes from the SoyNAM project and the following phenotypes: plant height, days to maturity, grain yield, and seed protein composition. Results: We propose an imputation method based on multivariate mixed models using pedigree information. Our methods comparison indicate that heritability of traits can be affected by the imputation method. Genotypes with missing values imputed with methods that make use of genealogic information can favor genetic analysis of highly polygenic traits, but not genome-wide prediction accuracy. The genotypic matrix captured the highest amount of genetic variance when missing loci were imputed by the method proposed in this paper. Conclusions: We concluded that hidden Markov models and random forest imputation are more suitable to studies that aim analyses of highly heritable traits while pedigree-based methods can be used to best analyze traits with low heritability. Despite the notable contribution to heritability, advantages in genomic prediction were not observed by changing the imputation method. We identified significant differences across imputation methods in a dataset missing 20 % of the genotypic values. It means that genotypic data from genotyping technologies that provide a high proportion of missing values, such as GBS, should be handled carefully because the imputation method will impact downstream analysis.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Impact of imputation methods on the amount of genetic variation captured by a single-nucleotide polymorphism panel in soybeans
    A. Xavier
    William M. Muir
    Katy M. Rainey
    BMC Bioinformatics, 17
  • [2] Evaluation of single-nucleotide polymorphism imputation using random forests
    Daniel F Schwarz
    Silke Szymczak
    Andreas Ziegler
    Inke R König
    BMC Proceedings, 3 (Suppl 7)
  • [3] Reading bits of genetic information: Methods for single-nucleotide polymorphism analysis
    Landegren, U
    Nilsson, M
    Kwok, PY
    GENOME RESEARCH, 1998, 8 (08): : 769 - 776
  • [4] Assessing single-nucleotide polymorphism selection methods for the development of a low-density panel optimized for imputation in South African Drakensberger beef cattle
    Lashmar, Simon F.
    Berry, Donagh P.
    Pierneef, Rian
    Muchadeyi, Farai C.
    Visser, Carina
    JOURNAL OF ANIMAL SCIENCE, 2021, 99 (07)
  • [5] Single-nucleotide polymorphism discovery and panel characterization in the African forest elephant
    Bourgeois, Stephanie
    Senn, Helen
    Kaden, Jenny
    Taggart, John B.
    Ogden, Rob
    Jeffery, Kathryn J.
    Bunnefeld, Nils
    Abernethy, Katharine
    McEwing, Ross
    ECOLOGY AND EVOLUTION, 2018, 8 (04): : 2207 - 2217
  • [6] Comparison of tagging single-nucleotide polymorphism methods in association analyses
    Ellen L Goode
    Brooke L Fridley
    Zhifu Sun
    Elizabeth J Atkinson
    Alex S Nord
    Shannon K McDonnell
    Gail P Jarvik
    Mariza de Andrade
    Susan L Slager
    BMC Proceedings, 1 (Suppl 1)
  • [7] Validation of a single-nucleotide polymorphism panel for parentage testing of farmed red deer
    Gudex, B.
    Walker, M.
    Fisher, P.
    Spelman, R.
    ANIMAL GENETICS, 2014, 45 (01) : 142 - 143
  • [8] Genetic profiling of myeloproliferative disorders by single-nucleotide polymorphism oligonucleotide microarray
    Kawamata, Norihiko
    Ogawa, Seishi
    Yamamoto, Go
    Lehmann, Soren
    Levine, Ross L.
    Pikman, Yana
    Nannya, Yasuhito
    Sanada, Masashi
    Miller, Carl W.
    Gilliland, D. Gary
    Koeffler, H. Phillip
    EXPERIMENTAL HEMATOLOGY, 2008, 36 (11) : 1471 - 1479
  • [9] Contribution of a common single-nucleotide polymorphism to the genetic predisposition for erythropoietic protoporphyria
    Gouya, L
    Martin-Schmitt, C
    Robreau, AM
    Austerlitz, F
    Da Silva, V
    Brun, P
    Simonin, S
    Lyoumi, S
    Grandchamp, B
    Beaumont, C
    Herve, P
    Deybach, JC
    AMERICAN JOURNAL OF HUMAN GENETICS, 2006, 78 (01) : 2 - 14
  • [10] Genetic maps of microsatellite and single-nucleotide polymorphism markers: Are the distances accurate?
    Leal, SM
    GENETIC EPIDEMIOLOGY, 2003, 24 (04) : 243 - 252