Benchmarking of de novo assembly algorithms for Nanopore data reveals optimal performance of OLC approaches

被引:15
|
作者
Cherukuri, Yesesri [1 ]
Janga, Sarath Chandra [1 ,2 ,3 ]
机构
[1] Indiana Univ Purdue Univ, Sch Informat & Comp, Dept Bio Hlth Informat, 719 Indiana Ave Ste 319,Walker Plaza Bldg, Indianapolis, IA 46202 USA
[2] Indiana Univ Sch Med, Ctr Computat Biol & Bioinformat, HITS 5021, 410 West 10th St, Indianapolis, IA 46202 USA
[3] Indiana Univ Sch Med, Dept Med & Mol Genet, Med Res & Lib Bldg,975 West Walnut St, Indianapolis, IA 46202 USA
来源
BMC GENOMICS | 2016年 / 17卷
关键词
Contigs; De novo assembly; De Bruijn; Greedy Extension graph; MinION (R); Nanopore; N50; Oxford Nanopore; HYBRID ERROR-CORRECTION; SEQUENCING TECHNOLOGIES; GENOME; READS; LONG;
D O I
10.1186/s12864-016-2895-8
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: Improved DNA sequencing methods have transformed the field of genomics over the last decade. This has become possible due to the development of inexpensive short read sequencing technologies which have now resulted in three generations of sequencing platforms. More recently, a new fourth generation of Nanopore based single molecule sequencing technology, was developed based on MinION (R) sequencer which is portable, inexpensive and fast. It is capable of generating reads of length greater than 100 kb. Though it has many specific advantages, the two major limitations of the MinION reads are high error rates and the need for the development of downstream pipelines. The algorithms for error correction have already emerged, while development of pipelines is still at nascent stage. Results: In this study, we benchmarked available assembler algorithms to find an appropriate framework that can efficiently assemble Nanopore sequenced reads. To address this, we employed genome-scale Nanopore sequenced datasets available for E. coli and yeast genomes respectively. In order to comprehensively evaluate multiple algorithmic frameworks, we included assemblers based on de Bruijn graphs (Velvet and ABySS), Overlap Layout Consensus (OLC) (Celera) and Greedy extension (SSAKE) approaches. We analyzed the quality, accuracy of the assemblies as well as the computational performance of each of the assemblers included in our benchmark. Our analysis unveiled that OLC-based algorithm, Celera, could generate a high quality assembly with ten times higher N50 & mean contig values as well as one-fifth the number of total number of contigs compared to other tools. Celera was also found to exhibit an average genome coverage of 12 % in E. coli dataset and 70 % in Yeast dataset as well as relatively lesser run times. In contrast, de Bruijn graph based assemblers Velvet and ABySS generated the assemblies of moderate quality, in less time when there is no limitation on the memory allocation, while greedy extension based algorithm SSAKE generated an assembly of very poor quality but with genome coverage of 90 % on yeast dataset. Conclusion: OLC can be considered as a favorable algorithmic framework for the development of assembler tools for Nanopore-based data, followed by de Bruijn based algorithms as they consume relatively less or similar run times as OLC-based algorithms for generating assembly, irrespective of the memory allocated for the task. However, few improvements must be made to the existing de Bruijn implementations in order to generate an assembly with reasonable quality. Our findings should help in stimulating the development of novel assemblers for handling Nanopore sequence data.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Benchmarking of de novo assembly algorithms for Nanopore data reveals optimal performance of OLC approaches
    Yesesri Cherukuri
    Sarath Chandra Janga
    BMC Genomics, 17
  • [2] Benchmarking of next and third generation sequencing technologies and their associated algorithms for de novo genome assembly
    Gavrielatos, Marios
    Kyriakidis, Konstantinos
    Spandidos, Demetrios A.
    Michalopoulos, Ioannis
    MOLECULAR MEDICINE REPORTS, 2021, 23 (04)
  • [3] Approaches to DNA de novo assembly
    Sovic, Ivan
    Skala, Karolj
    Sikic, Mile
    2013 36TH INTERNATIONAL CONVENTION ON INFORMATION AND COMMUNICATION TECHNOLOGY, ELECTRONICS AND MICROELECTRONICS (MIPRO), 2013, : 351 - 359
  • [4] Error Correction in Nanopore Reads for de novo Genomic Assembly
    Aldridge-Aguila, Jacqueline
    Alvarez-Saravia, Diego
    Navarrete, Marcelo
    Uribe-Paredes, Roberto
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2020, PT V, 2020, 12253 : 754 - 762
  • [5] De Novo Assembly of a New Solanum pennellii Accession Using Nanopore Sequencing
    Schmidt, Maximilian H. -W.
    Vogel, Alexander
    Denton, Alisandra K.
    Istace, Benjamin
    Wormit, Alexandra
    van de Geest, Henri
    Bolger, Marie E.
    Alseekh, Saleh
    Mass, Janina
    Pfaff, Christian
    Schurr, Ulrich
    Chetelat, Roger
    Maumus, Florian
    Aury, Jean-Marc
    Koren, Sergey
    Fernie, Alisdair R.
    Zamir, Dani
    Bolger, Anthony M.
    Usadel, Bjorn
    PLANT CELL, 2017, 29 (10): : 2336 - 2348
  • [6] A Classification of de Bruijn Graph Approaches for De Novo Fragment Assembly
    de Armas, Elvismary Molina
    Holanda, Maristela
    de Oliveira, Daniel
    Almeida, Nalvo F.
    Lifschitz, Sergio
    ADVANCES IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, BSB 2020, 2020, 12558 : 1 - 12
  • [7] Nanopore sequencing and de novo assembly of a misidentified Camelpox vaccine reveals putative epigenetic modifications and alternate protein signal peptides
    Saud, Zack
    Hitchings, Matthew D.
    Butt, Tariq M.
    SCIENTIFIC REPORTS, 2021, 11 (01)
  • [8] Nanopore sequencing and de novo assembly of a misidentified Camelpox vaccine reveals putative epigenetic modifications and alternate protein signal peptides
    Zack Saud
    Matthew D. Hitchings
    Tariq M. Butt
    Scientific Reports, 11
  • [9] Can we use it? On the utility of de novo and reference-based assembly of Nanopore data for plant plastome sequencing
    Scheunert, Agnes
    Dorfner, Marco
    Lingl, Thomas
    Oberprieler, Christoph
    PLOS ONE, 2020, 15 (03):
  • [10] De Novo Genome Assembly of Stinkhorn Mushroom Clathrus columnatus (Basidiomycota, Fungi) Using Illumina and Nanopore Sequencing Data
    Ogiso-Tanaka, Eri
    Itagaki, Hiyori
    Ohmae, Muneyuki
    Hosoya, Tsuyoshi
    Hosaka, Kentaro
    MICROBIOLOGY RESOURCE ANNOUNCEMENTS, 2022, 11 (02):