Genotype-free estimation of allele frequencies reduces bias and improves demographic inference from RADSeq data

被引:29
|
作者
Warmuth, Vera M. [1 ,2 ]
Ellegren, Hans [1 ]
机构
[1] Uppsala Univ, Evolutionary Biol Ctr, Dept Evolutionary Biol, Uppsala, Sweden
[2] Ludwig Maximilians Univ Munchen, Fac Biol, Div Evolutionary Biol, Martinsried, Germany
基金
瑞典研究理事会;
关键词
allele frequencies; angsd; demographic inference; genotyping error; RADSeq; GENOMIC DATA; POPULATION; SPECTRUM; FRAMEWORK; COVERAGE; GENETICS; POWER; STATISTICS; PHYLOGENY; EVOLUTION;
D O I
10.1111/1755-0998.12990
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Restriction-site associated DNA sequencing (RADSeq) facilitates rapid generation of thousands of genetic markers at relatively low cost; however, several sources of error specific to RADSeq methods often lead to biased estimates of allele frequencies and thereby to erroneous population genetic inference. Estimating the distribution of sample allele frequencies without calling genotypes was shown to improve population inference from whole genome sequencing data, but the ability of this approach to account for RADSeq-specific biases remains unexplored. Here we assess in how far genotype-free methods of allele frequency estimation affect demographic inference from empirical RADSeq data. Using the well-studied pied flycatcher (Ficedula hypoleuca) as a study system, we compare allele frequency estimation and demographic inference from whole genome sequencing data with that from RADSeq data matched for samples using both genotype-based and genotype free methods. The demographic history of pied flycatchers as inferred from RADSeq data was highly congruent with that inferred from whole genome resequencing (WGS) data when allele frequencies were estimated directly from the read data. In contrast, when allele frequencies were derived from called genotypes, RADSeq-based estimates of most model parameters fell outside the 95% confidence interval of estimates derived from WGS data. Notably, more stringent filtering of the genotype calls tended to increase the discrepancy between parameter estimates from WGS and RADSeq data, respectively. The results from this study demonstrate the ability of genotype-free methods to improve allele frequency spectrum- (AFS-) based demographic inference from empirical RADSeq data and highlight the need to account for uncertainty in NGS data regardless of sequencing method.
引用
收藏
页码:586 / 596
页数:11
相关论文
共 20 条
  • [1] Computationally Efficient Demographic History Inference from Allele Frequencies with Supervised Machine Learning
    Tran, Linh N.
    Sun, Connie K.
    Struck, Travis J.
    Sajan, Mathews
    Gutenkunst, Ryan N.
    [J]. MOLECULAR BIOLOGY AND EVOLUTION, 2024, 41 (05)
  • [2] Inference of population structure using multilocus genotype data: Linked loci and correlated allele frequencies
    Falush, D
    Stephens, M
    Pritchard, JK
    [J]. GENETICS, 2003, 164 (04) : 1567 - 1587
  • [3] IGHV allele similarity clustering improves genotype inference from adaptive immune receptor repertoire sequencing data
    Peres, Ayelet
    Lees, William D.
    Rodriguez, Oscar L.
    Lee, Noah Y.
    Polak, Pazit
    Hope, Ronen
    Kedmi, Meirav
    Collins, Andrew M.
    Ohlin, Mats
    Kleinstein, Steven H.
    Watson, Corey T.
    Yaari, Gur
    [J]. NUCLEIC ACIDS RESEARCH, 2023, 51 (16) : E86
  • [4] USING LINEAR PREDICTORS TO IMPUTE ALLELE FREQUENCIES FROM SUMMARY OR POOLED GENOTYPE DATA
    Wen, Xiaoquan
    Stephens, Matthew
    [J]. ANNALS OF APPLIED STATISTICS, 2010, 4 (03): : 1158 - 1182
  • [5] Fast individual ancestry inference from DNA sequence data leveraging allele frequencies for multiple populations
    Vikas Bansal
    Ondrej Libiger
    [J]. BMC Bioinformatics, 16
  • [6] Fast individual ancestry inference from DNA sequence data leveraging allele frequencies for multiple populations
    Bansal, Vikas
    Libiger, Ondrej
    [J]. BMC BIOINFORMATICS, 2015, 16
  • [7] Demographic genetics of brown trout (Salmo trutta) and estimation of effective population size from temporal change of allele frequencies
    Jorde, PE
    Ryman, N
    [J]. GENETICS, 1996, 143 (03) : 1369 - 1381
  • [8] Estimating inbreeding coefficients from NGS data: Impact on genotype calling and allele frequency estimation
    Vieira, Filipe G.
    Fumagalli, Matteo
    Albrechtsen, Anders
    Nielsen, Rasmus
    [J]. GENOME RESEARCH, 2013, 23 (11) : 1852 - 1861
  • [9] Estimation and comparison of ordered allele frequencies from case-control and sib-pair data.
    Bonney, GE
    Apprey, V
    Kwagyan, J
    [J]. AMERICAN JOURNAL OF HUMAN GENETICS, 2001, 69 (04) : 413 - 413
  • [10] SNP Calling, Genotype Calling, and Sample Allele Frequency Estimation from New-Generation Sequencing Data
    Nielsen, Rasmus
    Korneliussen, Thorfinn
    Albrechtsen, Anders
    Li, Yingrui
    Wang, Jun
    [J]. PLOS ONE, 2012, 7 (07):