A reference-free approach to analyse RADseq data using standard next generation sequencing toolkits

被引:14
|
作者
Heller, Rasmus [1 ]
Nursyifa, Casia [1 ]
Garcia-Erill, Genis [1 ]
Salmona, Jordi [2 ]
Chikhi, Lounes [2 ,3 ]
Meisner, Jonas [1 ]
Korneliussen, Thorfinn Sand [4 ]
Albrechtsen, Anders [1 ]
机构
[1] Univ Copenhagen, Dept Biol, Sect Computat & RNA Biol, DK-2200 Copenhagen N, Denmark
[2] Univ Paul Sabatier, CNRS, ENFA, UMR 5174 EDB,Lab Evolut & Div Biol, Toulouse, France
[3] Inst Gulbenkian Ciencias, Oeiras, Portugal
[4] Univ Copenhagen, GLOBE Inst, Sect GeoGenet, Copenhagen K, Denmark
关键词
allelic dropout; genetic diversity; genotype calling; genotype likelihood; RADseq; site frequency spectrum; READ ALIGNMENT; DISCOVERY; ASSOCIATION; DIVERSITY; FRAMEWORK; GENOTYPE; MAPS; SET;
D O I
10.1111/1755-0998.13324
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Genotyping-by-sequencing methods such as RADseq are popular for generating genomic and population-scale data sets from a diverse range of organisms. These often lack a usable reference genome, restricting users to RADseq specific software for processing. However, these come with limitations compared to generic next generation sequencing (NGS) toolkits. Here, we describe and test a simple pipeline for reference-free RADseq data processing that blends de novo elements from STACKS with the full suite of state-of-the art NGS tools. Specifically, we use the de novo RADseq assembly employed by STACKS to create a catalogue of RAD loci that serves as a reference for read mapping, variant calling and site filters. Using RADseq data from 28 zebra sequenced to similar to 8x depth-of-coverage we evaluate our approach by comparing the site frequency spectra (SFS) to those from alternative pipelines. Most pipelines yielded similar SFS at 8x depth, but only a genotype likelihood based pipeline performed similarly at low sequencing depth (2-4x). We compared the RADseq SFS with medium-depth (similar to 13x) shotgun sequencing of eight overlapping samples, revealing that the RADseq SFS was persistently slightly skewed towards rare and invariant alleles. Using simulations and human data we confirm that this is expected when there is allelic dropout (AD) in the RADseq data. AD in the RADseq data caused a heterozygosity deficit of similar to 16%, which dropped to similar to 5% after filtering AD. Hence, AD was the most important source of bias in our RADseq data.
引用
收藏
页码:1085 / 1097
页数:13
相关论文
共 50 条
  • [1] Reference-free compression of next-generation sequencing data in FASTQ format
    Tan, Li
    Sun, Jifeng
    2017 IEEE CONFERENCE ON COMPUTATIONAL INTELLIGENCE IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY (CIBCB), 2017, : 10 - 13
  • [2] Reference-free transcriptome assembly in non-model animals from next-generation sequencing data
    Cahais, V.
    Gayral, P.
    Tsagkogeorga, G.
    Melo-Ferreira, J.
    Ballenghien, M.
    Weinert, L.
    Chiari, Y.
    Belkhir, K.
    Ranwez, V.
    Galtier, N.
    MOLECULAR ECOLOGY RESOURCES, 2012, 12 (05) : 834 - 845
  • [3] Reference-free phylogeny from sequencing data
    Rysavy, Petr
    Zelezny, Filip
    BIODATA MINING, 2023, 16 (01)
  • [4] Reference-free phylogeny from sequencing data
    Petr Ryšavý
    Filip Železný
    BioData Mining, 16
  • [5] The Nubeam reference-free approach to analyze metagenomic sequencing reads
    Dai, Hang
    Guan, Yongtao
    GENOME RESEARCH, 2020, 30 (09) : 1364 - 1375
  • [6] Reference-Free Imputation of Targeted Next-Generation Sequence Datasets
    Nampally, Arun
    Kim, Joseph
    Proffitt, Eric
    Palovcak, Eugene
    Lacoste, Alix
    14TH ACM CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY, AND HEALTH INFORMATICS, BCB 2023, 2023,
  • [7] Reference-free lossless compression of nanopore sequencing reads using an approximate assembly approach
    Qingxi Meng
    Shubham Chandak
    Yifan Zhu
    Tsachy Weissman
    Scientific Reports, 13
  • [8] Reference-free lossless compression of nanopore sequencing reads using an approximate assembly approach
    Meng, Qingxi
    Chandak, Shubham
    Zhu, Yifan
    Weissman, Tsachy
    SCIENTIFIC REPORTS, 2023, 13 (01)
  • [9] Characterization of NIST standard reference materials by next generation sequencing
    Kiesler, Kevin M.
    Vallone, Peter M.
    FORENSIC SCIENCE INTERNATIONAL GENETICS SUPPLEMENT SERIES, 2013, 4 (01) : E97 - E98
  • [10] Reference-Free Population Genomics from Next-Generation Transcriptome Data and the Vertebrate-Invertebrate Gap
    Gayral, Philippe
    Melo-Ferreira, Jose
    Glemin, Sylvain
    Bierne, Nicolas
    Carneiro, Miguel
    Nabholz, Benoit
    Lourenco, Joao M.
    Alves, Paulo C.
    Ballenghien, Marion
    Faivre, Nicolas
    Belkhir, Khalid
    Cahais, Vincent
    Loire, Etienne
    Bernard, Aurelien
    Galtier, Nicolas
    PLOS GENETICS, 2013, 9 (04):