A reference-free approach to analyse RADseq data using standard next generation sequencing toolkits

被引:14
|
作者
Heller, Rasmus [1 ]
Nursyifa, Casia [1 ]
Garcia-Erill, Genis [1 ]
Salmona, Jordi [2 ]
Chikhi, Lounes [2 ,3 ]
Meisner, Jonas [1 ]
Korneliussen, Thorfinn Sand [4 ]
Albrechtsen, Anders [1 ]
机构
[1] Univ Copenhagen, Dept Biol, Sect Computat & RNA Biol, DK-2200 Copenhagen N, Denmark
[2] Univ Paul Sabatier, CNRS, ENFA, UMR 5174 EDB,Lab Evolut & Div Biol, Toulouse, France
[3] Inst Gulbenkian Ciencias, Oeiras, Portugal
[4] Univ Copenhagen, GLOBE Inst, Sect GeoGenet, Copenhagen K, Denmark
关键词
allelic dropout; genetic diversity; genotype calling; genotype likelihood; RADseq; site frequency spectrum; READ ALIGNMENT; DISCOVERY; ASSOCIATION; DIVERSITY; FRAMEWORK; GENOTYPE; MAPS; SET;
D O I
10.1111/1755-0998.13324
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Genotyping-by-sequencing methods such as RADseq are popular for generating genomic and population-scale data sets from a diverse range of organisms. These often lack a usable reference genome, restricting users to RADseq specific software for processing. However, these come with limitations compared to generic next generation sequencing (NGS) toolkits. Here, we describe and test a simple pipeline for reference-free RADseq data processing that blends de novo elements from STACKS with the full suite of state-of-the art NGS tools. Specifically, we use the de novo RADseq assembly employed by STACKS to create a catalogue of RAD loci that serves as a reference for read mapping, variant calling and site filters. Using RADseq data from 28 zebra sequenced to similar to 8x depth-of-coverage we evaluate our approach by comparing the site frequency spectra (SFS) to those from alternative pipelines. Most pipelines yielded similar SFS at 8x depth, but only a genotype likelihood based pipeline performed similarly at low sequencing depth (2-4x). We compared the RADseq SFS with medium-depth (similar to 13x) shotgun sequencing of eight overlapping samples, revealing that the RADseq SFS was persistently slightly skewed towards rare and invariant alleles. Using simulations and human data we confirm that this is expected when there is allelic dropout (AD) in the RADseq data. AD in the RADseq data caused a heterozygosity deficit of similar to 16%, which dropped to similar to 5% after filtering AD. Hence, AD was the most important source of bias in our RADseq data.
引用
收藏
页码:1085 / 1097
页数:13
相关论文
共 50 条
  • [41] Transcriptome sequencing of the Microarray Quality Control (MAQC) RNA reference samples using next generation sequencing
    Mane, Shrinivasrao P.
    Evans, Clive
    Cooper, Kristal L.
    Crasta, Oswald R.
    Folkerts, Otto
    Hutchison, Stephen K.
    Harkins, Timothy T.
    Thierry-Mieg, Danielle
    Thierry-Mieg, Jean
    Jensen, Roderick V.
    BMC GENOMICS, 2009, 10
  • [42] A statistical approach for tracking clonal dynamics in cancer using longitudinal next-generation sequencing data
    Vavoulis, Dimitrios V.
    Cutts, Anthony
    Taylor, Jenny C.
    Schuh, Anna
    BIOINFORMATICS, 2021, 37 (02) : 147 - 154
  • [43] Transcriptome sequencing of the Microarray Quality Control (MAQC) RNA reference samples using next generation sequencing
    Shrinivasrao P Mane
    Clive Evans
    Kristal L Cooper
    Oswald R Crasta
    Otto Folkerts
    Stephen K Hutchison
    Timothy T Harkins
    Danielle Thierry-Mieg
    Jean Thierry-Mieg
    Roderick V Jensen
    BMC Genomics, 10
  • [44] Molecular Characterization of Transgenic Events Using Next Generation Sequencing Approach
    Guttikonda, Satish K.
    Marri, Pradeep
    Mammadov, Jafar
    Ye, Liang
    Soe, Khaing
    Richey, Kimberly
    Cruse, James
    Zhuang, Meibao
    Gao, Zhifang
    Evans, Clive
    Rounsley, Steve
    Kumpatla, Siva P.
    PLOS ONE, 2016, 11 (02):
  • [45] Tumor clonality detection using next generation sequencing data.
    Dees, Nathan D.
    Miller, Christopher A.
    White, Brian S.
    Schierding, William
    Vij, Ravi
    Tomasson, Michael H.
    Welch, John S.
    Graubert, Timothy A.
    Walter, Matthew J.
    Ley, Timothy J.
    DiPersio, John F.
    Mardis, Elaine R.
    Wilson, Richard K.
    Ding, Li
    CANCER RESEARCH, 2013, 73 (08)
  • [46] Sequence assembly using next generation sequencing data —challenges and solutions
    CHIN Francis Y.L.
    LEUNG Henry C.M.
    YIU S.M.
    Science China(Life Sciences), 2014, 57 (11) : 1140 - 1148
  • [47] Sequence assembly using next generation sequencing data—challenges and solutions
    Francis Y. L. Chin
    Henry C. M. Leung
    S. M. Yiu
    Science China Life Sciences, 2014, 57 : 1140 - 1148
  • [48] Assembly of repetitive regions using next-generation sequencing data
    Nowak, Robert M.
    BIOCYBERNETICS AND BIOMEDICAL ENGINEERING, 2015, 35 (04) : 276 - 283
  • [49] Efficient Alignment of Next Generation Sequencing Data Using MapReduce on the Cloud
    AlSaad, Rawan
    Malluhi, Qutaibah
    Abouelhoda, Mohamed
    2012 CAIRO INTERNATIONAL BIOMEDICAL ENGINEERING CONFERENCE (CIBEC), 2012, : 18 - 22
  • [50] Sequence assembly using next generation sequencing data —challenges and solutions
    CHIN Francis Y.L.
    LEUNG Henry C.M.
    YIU S.M.
    Science China(Life Sciences) , 2014, (11) : 1140 - 1148