Impact of imputation methods on the amount of genetic variation captured by a single-nucleotide polymorphism panel in soybeans

被引:24
|
作者
Xavier, A. [1 ]
Muir, William M. [2 ]
Rainey, Katy M. [1 ]
机构
[1] Purdue Univ, Dept Agron, Lilly Hall Life Sci,915 W State St, W Lafayette, IN 47907 USA
[2] Purdue Univ, Dept Anim Sci, Lilly Hall Life Sci,915 W State St, W Lafayette, IN 47907 USA
来源
BMC BIOINFORMATICS | 2016年 / 17卷
关键词
Empirical Bayes; Heritability; Genomic selection; Association studies; WHOLE-GENOME REGRESSION; INCREASES POWER; GENOTYPE DATA; PREDICTION; SELECTION; ACCURACY; MARKERS; MODEL; PLANT; ASSOCIATION;
D O I
10.1186/s12859-016-0899-7
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Success in genome-wide association studies and marker-assisted selection depends on good phenotypic and genotypic data. The more complete this data is, the more powerful will be the results of analysis. Nevertheless, there are next-generation technologies that seek to provide genotypic information in spite of great proportions of missing data. The procedures these technologies use to impute genetic data, therefore, greatly affect downstream analyses. This study aims to (1) compare the genetic variance in a single-nucleotide polymorphism panel of soybean with missing data imputed using various methods, (2) evaluate the imputation accuracy and post-imputation quality associated with these methods, and (3) evaluate the impact of imputation method on heritability and the accuracy of genome-wide prediction of soybean traits. The imputation methods we evaluated were as follows: multivariate mixed model, hidden Markov model, logical algorithm, k-nearest neighbor, single value decomposition, and random forest. We used raw genotypes from the SoyNAM project and the following phenotypes: plant height, days to maturity, grain yield, and seed protein composition. Results: We propose an imputation method based on multivariate mixed models using pedigree information. Our methods comparison indicate that heritability of traits can be affected by the imputation method. Genotypes with missing values imputed with methods that make use of genealogic information can favor genetic analysis of highly polygenic traits, but not genome-wide prediction accuracy. The genotypic matrix captured the highest amount of genetic variance when missing loci were imputed by the method proposed in this paper. Conclusions: We concluded that hidden Markov models and random forest imputation are more suitable to studies that aim analyses of highly heritable traits while pedigree-based methods can be used to best analyze traits with low heritability. Despite the notable contribution to heritability, advantages in genomic prediction were not observed by changing the imputation method. We identified significant differences across imputation methods in a dataset missing 20 % of the genotypic values. It means that genotypic data from genotyping technologies that provide a high proportion of missing values, such as GBS, should be handled carefully because the imputation method will impact downstream analysis.
引用
收藏
页数:9
相关论文
共 50 条
  • [21] Joint analysis of sequence data and single-nucleotide polymorphism data using pedigree information for imputation and recombination inference
    Sunah Song
    Robert Shields
    Xin Li
    Jing Li
    BMC Proceedings, 8 (Suppl 1)
  • [22] Sequence imputation from low density single nucleotide polymorphism panel in a black poplar breeding population
    Marie Pégard
    Odile Rogier
    Aurélie Bérard
    Patricia Faivre-Rampant
    Marie-Christine Le Paslier
    Catherine Bastien
    Véronique Jorge
    Leopoldo Sánchez
    BMC Genomics, 20
  • [23] Sequence imputation from low density single nucleotide polymorphism panel in a black poplar breeding population
    Pegard, Marie
    Rogier, Odile
    Berard, Aurelie
    Faivre-Rampant, Patricia
    Le Paslier, Marie-Christine
    Bastien, Catherine
    Jorge, Veronique
    Sanchez, Leopoldo
    BMC GENOMICS, 2019, 20 (1)
  • [24] Genetic Analysis of Potato Breeding Collection Using Single-Nucleotide Polymorphism (SNP) Markers
    Xiao, Xi-ou
    Zhang, Ning
    Jin, Hui
    Si, Huaijun
    PLANTS-BASEL, 2023, 12 (09):
  • [25] Genetic diversity of Manihot esculenta Crantz germplasm based on single-nucleotide polymorphism markers
    Goncalves de Albuquerque, Hilcana Ylka
    do Carmo, Catia Dias
    Brito, Ana Carla
    de Oliveira, Eder Jorge
    ANNALS OF APPLIED BIOLOGY, 2018, 173 (03) : 271 - 284
  • [26] Genetic Variant of Single-Nucleotide Polymorphism Is Associated with Risk of Esophageal Squamous Cell Carcinoma
    Ye, Bo
    Feng, Jian
    Pan, Xiufeng
    Yang, Yu
    Ji, Chunyu
    Cheng, Ming
    Cheng, Yong
    Shi, Jianxin
    Zhao, Heng
    GENETIC TESTING AND MOLECULAR BIOMARKERS, 2014, 18 (01) : 45 - 49
  • [27] Accuracy of imputation of single-nucleotide polymorphism marker genotypes for water buffaloes (Bubalus bubalis) using different reference population sizes and imputation tools
    Ghoreishifar, Seyed Mohammad
    Moradi-Shahrbabak, Hossein
    Moradi-Shahrbabak, Mohammad
    Nicolazzi, Ezequiel L.
    Williams, John L.
    Iamartino, Daniela
    Nejati-Javaremi, Ardeshir
    LIVESTOCK SCIENCE, 2018, 216 : 174 - 182
  • [28] A narrative review of single-nucleotide polymorphism detection methods and their application in studies of Staphylococcus aureus
    Jian Ying
    Li Min
    生物组学研究杂志(英文), 2021, 04 (01) : 1 - 9
  • [29] Using Next-Generation Sequencing to Assist a Conservation Hatchery: a Single-Nucleotide Polymorphism Panel for the Genetic Management of Endangered Delta Smelt
    Lew, Ryan M.
    Finger, Amanda J.
    Baerwald, Melinda R.
    Goodbla, Alisha
    May, Bernie
    Meek, Mariah H.
    TRANSACTIONS OF THE AMERICAN FISHERIES SOCIETY, 2015, 144 (04) : 767 - 779
  • [30] Distinguishing Between Nile Tilapia Strains Using a Low-Density Single-Nucleotide Polymorphism Panel
    Hamilton, Matthew G.
    Lind, Curtis E.
    Barman, Benoy K.
    Velasco, Ravelina R.
    Danting, Ma. Jodecel C.
    Benzie, John A. H.
    FRONTIERS IN GENETICS, 2020, 11