Benchmarking of de novo assembly algorithms for Nanopore data reveals optimal performance of OLC approaches

被引:15
|
作者
Cherukuri, Yesesri [1 ]
Janga, Sarath Chandra [1 ,2 ,3 ]
机构
[1] Indiana Univ Purdue Univ, Sch Informat & Comp, Dept Bio Hlth Informat, 719 Indiana Ave Ste 319,Walker Plaza Bldg, Indianapolis, IA 46202 USA
[2] Indiana Univ Sch Med, Ctr Computat Biol & Bioinformat, HITS 5021, 410 West 10th St, Indianapolis, IA 46202 USA
[3] Indiana Univ Sch Med, Dept Med & Mol Genet, Med Res & Lib Bldg,975 West Walnut St, Indianapolis, IA 46202 USA
来源
BMC GENOMICS | 2016年 / 17卷
关键词
Contigs; De novo assembly; De Bruijn; Greedy Extension graph; MinION (R); Nanopore; N50; Oxford Nanopore; HYBRID ERROR-CORRECTION; SEQUENCING TECHNOLOGIES; GENOME; READS; LONG;
D O I
10.1186/s12864-016-2895-8
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: Improved DNA sequencing methods have transformed the field of genomics over the last decade. This has become possible due to the development of inexpensive short read sequencing technologies which have now resulted in three generations of sequencing platforms. More recently, a new fourth generation of Nanopore based single molecule sequencing technology, was developed based on MinION (R) sequencer which is portable, inexpensive and fast. It is capable of generating reads of length greater than 100 kb. Though it has many specific advantages, the two major limitations of the MinION reads are high error rates and the need for the development of downstream pipelines. The algorithms for error correction have already emerged, while development of pipelines is still at nascent stage. Results: In this study, we benchmarked available assembler algorithms to find an appropriate framework that can efficiently assemble Nanopore sequenced reads. To address this, we employed genome-scale Nanopore sequenced datasets available for E. coli and yeast genomes respectively. In order to comprehensively evaluate multiple algorithmic frameworks, we included assemblers based on de Bruijn graphs (Velvet and ABySS), Overlap Layout Consensus (OLC) (Celera) and Greedy extension (SSAKE) approaches. We analyzed the quality, accuracy of the assemblies as well as the computational performance of each of the assemblers included in our benchmark. Our analysis unveiled that OLC-based algorithm, Celera, could generate a high quality assembly with ten times higher N50 & mean contig values as well as one-fifth the number of total number of contigs compared to other tools. Celera was also found to exhibit an average genome coverage of 12 % in E. coli dataset and 70 % in Yeast dataset as well as relatively lesser run times. In contrast, de Bruijn graph based assemblers Velvet and ABySS generated the assemblies of moderate quality, in less time when there is no limitation on the memory allocation, while greedy extension based algorithm SSAKE generated an assembly of very poor quality but with genome coverage of 90 % on yeast dataset. Conclusion: OLC can be considered as a favorable algorithmic framework for the development of assembler tools for Nanopore-based data, followed by de Bruijn based algorithms as they consume relatively less or similar run times as OLC-based algorithms for generating assembly, irrespective of the memory allocated for the task. However, few improvements must be made to the existing de Bruijn implementations in order to generate an assembly with reasonable quality. Our findings should help in stimulating the development of novel assemblers for handling Nanopore sequence data.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] De novo genome assembly of the potent medicinal plant Rehmannia glutinosa using nanopore technology
    Ma, Ligang
    Dong, Chengming
    Song, Chi
    Wang, Xiaolan
    Zheng, Xiaoke
    Niu, Yan
    Chen, Shilin
    Feng, Weisheng
    COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2021, 19 : 3954 - 3963
  • [32] Nanopore sequencing and the Shasta toolkit enable efficient de novo assembly of eleven human genomes
    Kishwar Shafin
    Trevor Pesout
    Ryan Lorig-Roach
    Marina Haukness
    Hugh E. Olsen
    Colleen Bosworth
    Joel Armstrong
    Kristof Tigyi
    Nicholas Maurer
    Sergey Koren
    Fritz J. Sedlazeck
    Tobias Marschall
    Simon Mayes
    Vania Costa
    Justin M. Zook
    Kelvin J. Liu
    Duncan Kilburn
    Melanie Sorensen
    Katy M. Munson
    Mitchell R. Vollger
    Jean Monlong
    Erik Garrison
    Evan E. Eichler
    Sofie Salama
    David Haussler
    Richard E. Green
    Mark Akeson
    Adam Phillippy
    Karen H. Miga
    Paolo Carnevali
    Miten Jain
    Benedict Paten
    Nature Biotechnology, 2020, 38 : 1044 - 1053
  • [33] Nanopore Sequencing and Hi-C Based De Novo Assembly of Trachidermus fasciatus Genome
    Xie, Gangcai
    Zhang, Xu
    Lv, Feng
    Sang, Mengmeng
    Hu, Hairong
    Wang, Jinqiu
    Liu, Dong
    GENES, 2021, 12 (05)
  • [34] De Novo Genome Assembly for an Endangered Lemur Using Portable Nanopore Sequencing in Rural Madagascar
    Hauff, Lindsey
    Rasoanaivo, Noa Elosmie
    Razafindrakoto, Andriamahery
    Ravelonjanahary, Hajanirina
    Wright, Patricia C.
    Rakotoarivony, Rindra
    Bergey, Christina M.
    ECOLOGY AND EVOLUTION, 2025, 15 (01):
  • [35] Characterization, correction and de novo assembly of an Oxford Nanopore genomic dataset from Agrobacterium tumefaciens
    Stéphane Deschamps
    Joann Mudge
    Connor Cameron
    Thiruvarangan Ramaraj
    Ajith Anand
    Kevin Fengler
    Kevin Hayes
    Victor Llaca
    Todd J. Jones
    Gregory May
    Scientific Reports, 6
  • [36] De novo assembly and annotation of three Leptosphaeria genomes using Oxford Nanopore MinION sequencing
    Fabien Dutreux
    Corinne Da Silva
    Léo d’Agata
    Arnaud Couloux
    Elise J. Gay
    Benjamin Istace
    Nicolas Lapalu
    Arnaud Lemainque
    Juliette Linglin
    Benjamin Noel
    Patrick Wincker
    Corinne Cruaud
    Thierry Rouxel
    Marie-Hélène Balesdent
    Jean-Marc Aury
    Scientific Data, 5
  • [37] Rapid centriole assembly in Naegleria reveals conserved roles for both de novo and mentored assembly
    Fritz-Laylin, Lillian K.
    Levy, Yaron Y.
    Levitan, Edward
    Chen, Sean
    Cande, W. Zacheus
    Lai, Elaine Y.
    Fulton, Chandler
    CYTOSKELETON, 2016, 73 (03) : 109 - 116
  • [38] De Novo Assembly Methods for Next Generation Sequencing Data
    He, Yiming
    Zhang, Zhen
    Peng, Xiaoqing
    Wu, Fangxiang
    Wang, Jianxin
    TSINGHUA SCIENCE AND TECHNOLOGY, 2013, 18 (05) : 500 - 514
  • [39] Data of de novo assembly of the leaf transcriptome in Aegle marmelos
    Kaushik, Prashant
    Kumar, Shashi
    DATA IN BRIEF, 2018, 19 : 700 - 703
  • [40] De Novo Assembly Methods for Next Generation Sequencing Data
    Yiming He
    Zhen Zhang
    Xiaoqing Peng
    Fangxiang Wu
    Jianxin Wang
    TsinghuaScienceandTechnology, 2013, 18 (05) : 500 - 514