A reference-free approach to analyse RADseq data using standard next generation sequencing toolkits

被引：14

作者：

Heller, Rasmus ^{[1
]}

Nursyifa, Casia ^{[1
]}

Garcia-Erill, Genis ^{[1
]}

Salmona, Jordi ^{[2
]}

Chikhi, Lounes ^{[2
,3
]}

Meisner, Jonas ^{[1
]}

Korneliussen, Thorfinn Sand ^{[4
]}

Albrechtsen, Anders ^{[1
]}

机构：

[1] Univ Copenhagen, Dept Biol, Sect Computat & RNA Biol, DK-2200 Copenhagen N, Denmark

[2] Univ Paul Sabatier, CNRS, ENFA, UMR 5174 EDB,Lab Evolut & Div Biol, Toulouse, France

[3] Inst Gulbenkian Ciencias, Oeiras, Portugal

[4] Univ Copenhagen, GLOBE Inst, Sect GeoGenet, Copenhagen K, Denmark

来源：

MOLECULAR ECOLOGY RESOURCES | 2021年 / 21卷 / 04期

关键词：

allelic dropout; genetic diversity; genotype calling; genotype likelihood; RADseq; site frequency spectrum; READ ALIGNMENT; DISCOVERY; ASSOCIATION; DIVERSITY; FRAMEWORK; GENOTYPE; MAPS; SET;

D O I：

10.1111/1755-0998.13324

中图分类号：

Q5 [生物化学]; Q7 [分子生物学];

学科分类号：

071010 ; 081704 ;

摘要：

Genotyping-by-sequencing methods such as RADseq are popular for generating genomic and population-scale data sets from a diverse range of organisms. These often lack a usable reference genome, restricting users to RADseq specific software for processing. However, these come with limitations compared to generic next generation sequencing (NGS) toolkits. Here, we describe and test a simple pipeline for reference-free RADseq data processing that blends de novo elements from STACKS with the full suite of state-of-the art NGS tools. Specifically, we use the de novo RADseq assembly employed by STACKS to create a catalogue of RAD loci that serves as a reference for read mapping, variant calling and site filters. Using RADseq data from 28 zebra sequenced to similar to 8x depth-of-coverage we evaluate our approach by comparing the site frequency spectra (SFS) to those from alternative pipelines. Most pipelines yielded similar SFS at 8x depth, but only a genotype likelihood based pipeline performed similarly at low sequencing depth (2-4x). We compared the RADseq SFS with medium-depth (similar to 13x) shotgun sequencing of eight overlapping samples, revealing that the RADseq SFS was persistently slightly skewed towards rare and invariant alleles. Using simulations and human data we confirm that this is expected when there is allelic dropout (AD) in the RADseq data. AD in the RADseq data caused a heterozygosity deficit of similar to 16%, which dropped to similar to 5% after filtering AD. Hence, AD was the most important source of bias in our RADseq data.

引用

页码：1085 / 1097

页数：13

共 50 条

[21] Reference-free Association Mapping from Sequencing Reads Using k-mers
Mehrab, Zakaria
Mobin, Jaiaid
Tahmid, Ibrahim Asadullah
Pachter, Lior
Rahman, Atif
BIO-PROTOCOL, 2020, 10 (21):
[22] Erratum to: ‘Reference-free inference of tumor phylogenies from single-cell sequencing data’
Ayshwarya Subramanian
Russell Schwartz
BMC Genomics, 17
[23] Zseq: An Approach for Preprocessing Next-Generation Sequencing Data
Alkhateeb, Abedalrhman
Rueda, Luis
JOURNAL OF COMPUTATIONAL BIOLOGY, 2017, 24 (08) : 746 - 755
[24] ROHMM, a flexible HMM approach for detecting homozygosity using next generation sequencing data
Celik, G.
Tuncali, T.
EUROPEAN JOURNAL OF HUMAN GENETICS, 2020, 28 (SUPPL 1) : 668 - 668
[25] Reference-free Identification of Phage DNA Using Signal Processing on Nanopore Data
Kupkova, Kristyna
Sedlar, Karel
Provaznik, Ivo
2017 IEEE 17TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOENGINEERING (BIBE), 2017, : 101 - 105
[26] RUbioSeq plus : An Application that Executes Parallelized Pipelines to Analyse Next-Generation Sequencing Data
Rubio-Camarillo, Miriam
Lopez-Fernandez, Hugo
Gomez-Lopez, Gonzalo
Carro, Angel
Maria Fernandez, Jose
Fdez-Riverola, Florentino
Glez-Pena, Daniel
Pisano, David G.
10TH INTERNATIONAL CONFERENCE ON PRACTICAL APPLICATIONS OF COMPUTATIONAL BIOLOGY & BIOINFORMATICS, 2016, 477 : 141 - 149
[27] Reference-free assembly of long-read transcriptome sequencing data with RNA-Bloom2
Nip, Ka Ming
Hafezqorani, Saber
Gagalova, Kristina K.
Chiu, Readman
Yang, Chen
Warren, Rene L.
Birol, Inanc
NATURE COMMUNICATIONS, 2023, 14 (01)
[28] qc3C: Reference-free quality control for Hi-C sequencing data
DeMaere, Matthew Z.
Darling, Aaron E.
PLOS COMPUTATIONAL BIOLOGY, 2021, 17 (10) : e1008839
[29] Reference-free assembly of long-read transcriptome sequencing data with RNA-Bloom2
Ka Ming Nip
Saber Hafezqorani
Kristina K. Gagalova
Readman Chiu
Chen Yang
René L. Warren
Inanc Birol
Nature Communications, 14 (1)
[30] Biomarker identification using Next Generation Sequencing data of RNA
Bhowmick, Shib Sankar
Saha, Indrajit
Maulik, Ujjwal
Bhattacharjee, Debotosh
2016 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2016, : 299 - 303

← 1 2 3 4 5 →