High-throughput estimation of allele frequencies using combined pooled-population sequencing and haplotype-based data processing

被引:8
|
作者
Schneider, Michael [1 ,2 ]
Shrestha, Asis [1 ,2 ]
Ballvora, Agim [1 ]
Leon, Jens [1 ]
机构
[1] Univ Bonn, Inst Crop Sci & Resource Conservat, Plant Breeding, Katzenburgweg 5, D-53115 Bonn, Germany
[2] Univ Duesseldorf, Inst Quantitat Genet & Genom Plants, Univ Str 1, D-40225 Dusseldorf, Germany
关键词
Pool sequencing; Genotyping; Allele frequency estimation; Single nucleotide polymorphisms; Haplotype; Hordeum vulgare; ALIGNMENT; ADAPTATION; EVOLUTION; ACCURACY; GENES;
D O I
10.1186/s13007-022-00852-8
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background In addition to heterogeneity and artificial selection, natural selection is one of the forces used to combat climate change and improve agrobiodiversity in evolutionary plant breeding. Accurate identification of the specific genomic effects of natural selection will likely accelerate transfer between populations. Thus, insights into changes in allele frequency, adequate population size, gene flow and drift are essential. However, observing such effects often involves a trade-off between costs and resolution when a large sample of genotypes for many loci is analysed. Pool genotyping approaches achieve high resolution and precision in estimating allele frequency when sequence coverage is high. Nevertheless, high-coverage pool sequencing of large genomes is expensive. Results Three pool samples (n = 300, 300, 288) from a barley backcross population were generated to assess the population's allele frequency. The tested population (BC2F21) has undergone 18 generations of natural adaption to conventional farming practice. The accuracies of estimated pool-based allele frequencies and genome coverage yields were compared using three next-generation sequencing genotyping methods. To achieve accurate allele frequency estimates with low sequence coverage, we employed a haplotyping approach. Low coverage allele frequencies of closely located single polymorphisms were aggregated into a single haplotype allele frequency, yielding 2-to-271-times higher depth and increased precision. When we combined different haplotyping tactics, we found that gene and chip marker-based haplotype analyses performed equivalently or better compared with simple contig haplotype windows. Comparing multiple pool samples and referencing against an individual sequencing approach revealed that whole-genome pool re-sequencing (WGS) achieved the highest correlation with individual genotyping (>= 0.97). In contrast, transcriptome-based genotyping (MACE) and genotyping by sequencing (GBS) pool replicates were significantly associated with higher error rates and lower correlations, but are still valuable to detect large allele frequency variations. Conclusions The proposed strategy identified the allele frequency of populations with high accuracy at low cost. This is particularly relevant to evolutionary plant breeding of crops with very large genomes, such as barley. Whole-genome low coverage re-sequencing at 0.03 x coverage per genotype accurately estimated the allele frequency when a loci-based haplotyping approach was applied. The implementation of annotated haplotypes capitalises on the biological background and statistical robustness.
引用
收藏
页数:18
相关论文
共 50 条
  • [1] High-throughput estimation of allele frequencies using combined pooled-population sequencing and haplotype-based data processing
    Michael Schneider
    Asis Shrestha
    Agim Ballvora
    Jens Léon
    Plant Methods, 18
  • [2] Polymorphism discovery and allele frequency estimation using high-throughput DNA sequencing of target-enriched pooled DNA samples
    Michael P Mullen
    Christopher J Creevey
    Donagh P Berry
    Matt S McCabe
    David A Magee
    Dawn J Howard
    Aideen P Killeen
    Stephen D Park
    Paul A McGettigan
    Matt C Lucy
    David E MacHugh
    Sinead M Waters
    BMC Genomics, 13
  • [3] Polymorphism discovery and allele frequency estimation using high-throughput DNA sequencing of target-enriched pooled DNA samples
    Mullen, Michael P.
    Creevey, Christopher J.
    Berry, Donagh P.
    McCabe, Matt S.
    Magee, David A.
    Howard, Dawn J.
    Killeen, Aideen P.
    Park, Stephen D.
    McGettigan, Paul A.
    Lucy, Matt C.
    MacHugh, David E.
    Waters, Sinead M.
    BMC GENOMICS, 2012, 13
  • [4] Evaluation of Allele Frequency Estimation Using Pooled Sequencing Data Simulation
    Guo, Yan
    Samuels, David C.
    Li, Jiang
    Clark, Travis
    Li, Chung-I
    Shyr, Yu
    SCIENTIFIC WORLD JOURNAL, 2013,
  • [5] USING HIGH-THROUGHPUT ALLELE FREQUENCY SEQUENCING TO MONITOR BACTERIAL POPULATION COMPOSITION IN CF INFECTIONS
    Jorth, P.
    Siehnel, R. J.
    Staudinger, B.
    Goddard, A. F.
    Singh, P.
    PEDIATRIC PULMONOLOGY, 2014, 49 : 348 - 349
  • [6] How to optimize the precision of allele and haplotype frequency estimates using pooled-sequencing data
    Rode, Nicolas O.
    Holtz, Yan
    Loridon, Karine
    Santoni, Sylvain
    Ronfort, Joelle
    Gay, Laurene
    MOLECULAR ECOLOGY RESOURCES, 2018, 18 (02) : 194 - 203
  • [7] LDx: Estimation of Linkage Disequilibrium from High-Throughput Pooled Resequencing Data
    Feder, Alison F.
    Petrov, Dmitri A.
    Bergland, Alan O.
    PLOS ONE, 2012, 7 (11):
  • [8] Genotype-Frequency Estimation from High-Throughput Sequencing Data
    Maruki, Takahiro
    Lynch, Michael
    GENETICS, 2015, 201 (02) : 473 - +
  • [9] Linkage Disequilibrium Estimation in Low Coverage High-Throughput Sequencing Data
    Bilton, Timothy P.
    McEwan, John C.
    Clarke, Shannon M.
    Brauning, Rudiger
    van Stijn, Tracey C.
    Rowe, Suzanne J.
    Dodds, Ken G.
    GENETICS, 2018, 209 (02) : 389 - 400
  • [10] Estimation of haplotype frequencies and diplotype configuration for each subject using pooled DNA data.
    Ito, T
    Chiku, S
    Inoue, E
    Tomita, M
    Kamatani, N
    AMERICAN JOURNAL OF HUMAN GENETICS, 2002, 71 (04) : 449 - 449