cloudSPAdes: assembly of synthetic long reads using de Bruijn graphs

被引:18
|
作者
Tolstoganov, Ivan [1 ]
Bankevich, Anton [2 ]
Chen, Zhoutao [3 ]
Pevzner, Pavel A. [1 ,2 ]
机构
[1] St Petersburg State Univ, Inst Translat Biomed, Ctr Algorithm Biotechnol, St Petersburg, Russia
[2] Univ Calif San Diego, Dept Comp Sci & Engn, La Jolla, CA 92093 USA
[3] Universal Sequencing Technol Corp, Carlsbad, CA USA
基金
俄罗斯科学基金会;
关键词
DNA EXTRACTION; GENOME; ACCURATE;
D O I
10.1093/bioinformatics/btz349
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation The recently developed barcoding-based synthetic long read (SLR) technologies have already found many applications in genome assembly and analysis. However, although some new barcoding protocols are emerging and the range of SLR applications is being expanded, the existing SLR assemblers are optimized for a narrow range of parameters and are not easily extendable to new barcoding technologies and new applications such as metagenomics or hybrid assembly. Results We describe the algorithmic challenge of the SLR assembly and present a cloudSPAdes algorithm for SLR assembly that is based on analyzing the de Bruijn graph of SLRs. We benchmarked cloudSPAdes across various barcoding technologies/applications and demonstrated that it improves on the state-of-the-art SLR assemblers in accuracy and speed. Availability and implementation Source code and installation manual for cloudSPAdes are available at https://github.com/ablab/spades/releases/tag/cloudspades-paper. Supplementary Information Supplementary data are available at Bioinformatics online.
引用
收藏
页码:I61 / I70
页数:10
相关论文
共 50 条
  • [21] Assembly of long, error-prone reads using repeat graphs
    Mikhail Kolmogorov
    Jeffrey Yuan
    Yu Lin
    Pavel A. Pevzner
    Nature Biotechnology, 2019, 37 : 540 - 546
  • [22] Scaling metagenome sequence assembly with probabilistic de Bruijn graphs
    Pell, Jason
    Hintze, Arend
    Canino-Koning, Rosangela
    Howe, Adina
    Tiedje, James M.
    Brown, C. Titus
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2012, 109 (33) : 13272 - 13277
  • [23] De novo diploid genome assembly using long noisy reads
    Nie, Fan
    Ni, Peng
    Huang, Neng
    Zhang, Jun
    Wang, Zhenyu
    Xiao, Chuanle
    Luo, Feng
    Wang, Jianxin
    NATURE COMMUNICATIONS, 2024, 15 (01)
  • [24] De novo diploid genome assembly using long noisy reads
    Fan Nie
    Peng Ni
    Neng Huang
    Jun Zhang
    Zhenyu Wang
    Chuanle Xiao
    Feng Luo
    Jianxin Wang
    Nature Communications, 15
  • [25] De Bruijn sequences and De Bruijn graphs for a general language
    Moreno, E
    INFORMATION PROCESSING LETTERS, 2005, 96 (06) : 214 - 219
  • [26] Hybrid correction of highly noisy long reads using a variable-order de Bruijn graph
    Morisse, Pierre
    Lecroq, Thierry
    Lefebvre, Arnaud
    BIOINFORMATICS, 2018, 34 (24) : 4213 - 4222
  • [27] TruSPAdes: barcode assembly of TruSeq synthetic long reads
    Bankevich, Anton
    Pevzner, Pavel A.
    NATURE METHODS, 2016, 13 (03) : 248 - +
  • [28] TruSPAdes: Barcode assembly of TruSeq synthetic long reads
    Bankevich A.
    Pevzner P.A.
    Nature Methods, 2016, 13 (3) : 248 - 250
  • [29] On hypercubes in de Bruijn graphs
    Andreae, Thomas
    Hintz, Martin
    Parallel Processing Letters, 1998, 8 (02): : 259 - 268
  • [30] Generalized de Bruijn graphs
    Malyshev, FM
    Tarakanov, VE
    MATHEMATICAL NOTES, 1997, 62 (3-4) : 449 - 456