Benchmarking of de novo assembly algorithms for Nanopore data reveals optimal performance of OLC approaches

被引:15
|
作者
Cherukuri, Yesesri [1 ]
Janga, Sarath Chandra [1 ,2 ,3 ]
机构
[1] Indiana Univ Purdue Univ, Sch Informat & Comp, Dept Bio Hlth Informat, 719 Indiana Ave Ste 319,Walker Plaza Bldg, Indianapolis, IA 46202 USA
[2] Indiana Univ Sch Med, Ctr Computat Biol & Bioinformat, HITS 5021, 410 West 10th St, Indianapolis, IA 46202 USA
[3] Indiana Univ Sch Med, Dept Med & Mol Genet, Med Res & Lib Bldg,975 West Walnut St, Indianapolis, IA 46202 USA
来源
BMC GENOMICS | 2016年 / 17卷
关键词
Contigs; De novo assembly; De Bruijn; Greedy Extension graph; MinION (R); Nanopore; N50; Oxford Nanopore; HYBRID ERROR-CORRECTION; SEQUENCING TECHNOLOGIES; GENOME; READS; LONG;
D O I
10.1186/s12864-016-2895-8
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: Improved DNA sequencing methods have transformed the field of genomics over the last decade. This has become possible due to the development of inexpensive short read sequencing technologies which have now resulted in three generations of sequencing platforms. More recently, a new fourth generation of Nanopore based single molecule sequencing technology, was developed based on MinION (R) sequencer which is portable, inexpensive and fast. It is capable of generating reads of length greater than 100 kb. Though it has many specific advantages, the two major limitations of the MinION reads are high error rates and the need for the development of downstream pipelines. The algorithms for error correction have already emerged, while development of pipelines is still at nascent stage. Results: In this study, we benchmarked available assembler algorithms to find an appropriate framework that can efficiently assemble Nanopore sequenced reads. To address this, we employed genome-scale Nanopore sequenced datasets available for E. coli and yeast genomes respectively. In order to comprehensively evaluate multiple algorithmic frameworks, we included assemblers based on de Bruijn graphs (Velvet and ABySS), Overlap Layout Consensus (OLC) (Celera) and Greedy extension (SSAKE) approaches. We analyzed the quality, accuracy of the assemblies as well as the computational performance of each of the assemblers included in our benchmark. Our analysis unveiled that OLC-based algorithm, Celera, could generate a high quality assembly with ten times higher N50 & mean contig values as well as one-fifth the number of total number of contigs compared to other tools. Celera was also found to exhibit an average genome coverage of 12 % in E. coli dataset and 70 % in Yeast dataset as well as relatively lesser run times. In contrast, de Bruijn graph based assemblers Velvet and ABySS generated the assemblies of moderate quality, in less time when there is no limitation on the memory allocation, while greedy extension based algorithm SSAKE generated an assembly of very poor quality but with genome coverage of 90 % on yeast dataset. Conclusion: OLC can be considered as a favorable algorithmic framework for the development of assembler tools for Nanopore-based data, followed by de Bruijn based algorithms as they consume relatively less or similar run times as OLC-based algorithms for generating assembly, irrespective of the memory allocated for the task. However, few improvements must be made to the existing de Bruijn implementations in order to generate an assembly with reasonable quality. Our findings should help in stimulating the development of novel assemblers for handling Nanopore sequence data.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] De novo assembly and analysis of RNA-seq data
    Robertson, Gordon
    Schein, Jacqueline
    Chiu, Readman
    Corbett, Richard
    Field, Matthew
    Jackman, Shaun D.
    Mungall, Karen
    Lee, Sam
    Okada, Hisanaga Mark
    Qian, Jenny Q.
    Griffith, Malachi
    Raymond, Anthony
    Thiessen, Nina
    Cezard, Timothee
    Butterfield, Yaron S.
    Newsome, Richard
    Chan, Simon K.
    She, Rong
    Varhol, Richard
    Kamoh, Baljit
    Prabhu, Anna-Liisa
    Tam, Angela
    Zhao, YongJun
    Moore, Richard A.
    Hirst, Martin
    Marra, Marco A.
    Jones, Steven J. M.
    Hoodless, Pamela A.
    Birol, Inanc
    NATURE METHODS, 2010, 7 (11) : 909 - U62
  • [42] De novo assembly and analysis of RNA-seq data
    Gordon Robertson
    Jacqueline Schein
    Readman Chiu
    Richard Corbett
    Matthew Field
    Shaun D Jackman
    Karen Mungall
    Sam Lee
    Hisanaga Mark Okada
    Jenny Q Qian
    Malachi Griffith
    Anthony Raymond
    Nina Thiessen
    Timothee Cezard
    Yaron S Butterfield
    Richard Newsome
    Simon K Chan
    Rong She
    Richard Varhol
    Baljit Kamoh
    Anna-Liisa Prabhu
    Angela Tam
    YongJun Zhao
    Richard A Moore
    Martin Hirst
    Marco A Marra
    Steven J M Jones
    Pamela A Hoodless
    Inanc Birol
    Nature Methods, 2010, 7 : 909 - 912
  • [43] De novo genome assembly for third generation sequencing data
    Forc, Mateusz
    Kusmirek, Wiktor
    Nowak, Robert M.
    PHOTONICS APPLICATIONS IN ASTRONOMY, COMMUNICATIONS, INDUSTRY, AND HIGH-ENERGY PHYSICS EXPERIMENTS 2018, 2018, 10808
  • [44] Benchmarking the performance of homogenization algorithms on synthetic daily temperature data
    Killick, Rachel E.
    Jolliffe, Ian T.
    Willett, Kate M.
    INTERNATIONAL JOURNAL OF CLIMATOLOGY, 2022, 42 (07) : 3968 - 3986
  • [45] A complete bacterial genome assembled de novo using only nanopore sequencing data
    Loman N.J.
    Quick J.
    Simpson J.T.
    Nature Methods, 2015, 12 (8) : 733 - 735
  • [46] A complete bacterial genome assembled de novo using only nanopore sequencing data
    Loman, Nicholas J.
    Quick, Joshua
    Simpson, Jared T.
    NATURE METHODS, 2015, 12 (08) : 733 - U51
  • [47] Author Correction: Rapid de novo assembly of the European eel genome from nanopore sequencing reads
    Hans J. Jansen
    Michael Liem
    Susanne A. Jong-Raadsen
    Sylvie Dufour
    Finn-Arne Weltzien
    William Swinkels
    Alex Koelewijn
    Arjan P. Palstra
    Bernd Pelster
    Herman P. Spaink
    Guido E. van den Thillart
    Ron P. Dirks
    Christiaan V. Henkel
    Scientific Reports, 9
  • [48] de novo assembly and population genomic survey of natural yeast isolates with the Oxford Nanopore MinION sequencer
    Istace, Benjamin
    Friedrich, Anne
    d'Agata, Leo
    Faye, Sebastien
    Payen, Emilie
    Beluche, Odette
    Caradec, Claudia
    Davidas, Sabrina
    Cruaud, Corinne
    Liti, Gianni
    Lemainque, Arnaud
    Engelen, Stefan
    Wincker, Patrick
    Schacherer, Joseph
    Aury, Jean-Marc
    GIGASCIENCE, 2017, 6 (02):
  • [49] Dataset from de novo transcriptome assembly of Myristica fatua leaves using MinION nanopore sequencer
    Matra, Deden Derajat
    Adrian, M.
    Kusuma, Jakty
    Duminil, Jerome
    Sobir
    Poerwanto, Roedhy
    DATA IN BRIEF, 2023, 46
  • [50] De novo genome assembly and annotation of Holothuria scabra (Jaeger, 1833) from nanopore sequencing reads
    Honglin Luo
    Guanghua Huang
    Jianbin Li
    Qiong Yang
    Jiajie Zhu
    Bin Zhang
    Pengfei Feng
    Yongde Zhang
    Xueming Yang
    Genes & Genomics, 2022, 44 : 1487 - 1498