Evaluating long-read de novo assembly tools for eukaryotic genomes: insights and considerations

被引:7
|
作者
Cosma, Bianca-Maria [1 ]
Zade, Ramin Shirali Hossein [1 ]
Jordan, Erin Noel [1 ,2 ]
van Lent, Paul [1 ]
Peng, Chengyao [1 ]
Pillay, Stephanie [1 ]
Abeel, Thomas [1 ,3 ]
机构
[1] Delft Univ Technol, Delft Bioinformat Lab, Intelligent Syst, NL-2628 XE Delft, Netherlands
[2] TU Dortmund Univ, Tech Biochem, D-44227 Dortmund, Germany
[3] Broad Inst MIT & Harvard, Infect Dis & Microbiome Program, Cambridge, MA 02142 USA
来源
GIGASCIENCE | 2023年 / 12卷
关键词
de novo assembly; third-generation sequencing; benchmarking; eukaryote genomes;
D O I
10.1093/gigascience/giad100
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background Assembly algorithm choice should be a deliberate, well-justified decision when researchers create genome assemblies for eukaryotic organisms from third-generation sequencing technologies. While third-generation sequencing by Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio) has overcome the disadvantages of short read lengths specific to next-generation sequencing (NGS), third-generation sequencers are known to produce more error-prone reads, thereby generating a new set of challenges for assembly algorithms and pipelines. However, the introduction of HiFi reads, which offer substantially reduced error rates, has provided a promising solution for more accurate assembly outcomes. Since the introduction of third-generation sequencing technologies, many tools have been developed that aim to take advantage of the longer reads, and researchers need to choose the correct assembler for their projects.Results We benchmarked state-of-the-art long-read de novo assemblers to help readers make a balanced choice for the assembly of eukaryotes. To this end, we used 12 real and 64 simulated datasets from different eukaryotic genomes, with different read length distributions, imitating PacBio continuous long-read (CLR), PacBio high-fidelity (HiFi), and ONT sequencing to evaluate the assemblers. We include 5 commonly used long-read assemblers in our benchmark: Canu, Flye, Miniasm, Raven, and wtdbg2 for ONT and PacBio CLR reads. For PacBio HiFi reads , we include 5 state-of-the-art HiFi assemblers: HiCanu, Flye, Hifiasm, LJA, and MBG. Evaluation categories address the following metrics: reference-based metrics, assembly statistics, misassembly count, BUSCO completeness, runtime, and RAM usage. Additionally, we investigated the effect of increased read length on the quality of the assemblies and report that read length can, but does not always, positively impact assembly quality.Conclusions Our benchmark concludes that there is no assembler that performs the best in all the evaluation categories. However, our results show that overall Flye is the best-performing assembler for PacBio CLR and ONT reads, both on real and simulated data. Meanwhile, best-performing PacBio HiFi assemblers are Hifiasm and LJA. Next, the benchmarking using longer reads shows that the increased read length improves assembly quality, but the extent to which that can be achieved depends on the size and complexity of the reference genome.
引用
收藏
页数:12
相关论文
共 50 条
  • [41] Evaluation of strategies for the assembly of diverse bacterial genomes using MinION long-read sequencing
    Goldstein, Sarah
    Beka, Lidia
    Graf, Joerg
    Klassen, Jonathan L.
    BMC GENOMICS, 2019, 20 (1)
  • [42] Evaluation of strategies for the assembly of diverse bacterial genomes using MinION long-read sequencing
    Sarah Goldstein
    Lidia Beka
    Joerg Graf
    Jonathan L. Klassen
    BMC Genomics, 20
  • [43] CLAW: An automated Snakemake workflow for the assembly of chloroplast genomes from long-read data
    Phillips, Aaron L.
    Ferguson, Scott
    Burton, Rachel A.
    Watson-Haigh, Nathan S.
    PLOS COMPUTATIONAL BIOLOGY, 2024, 20 (02)
  • [44] Evaluating long-read RNA-sequencing analysis tools with in silico mixtures
    Dong, Xueyi
    Ritchie, Matthew E.
    NATURE METHODS, 2023, 20 (11) : 1643 - 1644
  • [46] Complete de novo assembly of Wolbachia endosymbiont of Diaphorinacitri Kuwayama (Hemiptera: Liviidae) using long-read genome sequencing
    Surendra Neupane
    Sylvia I. Bonilla
    Andrew M. Manalo
    Kirsten S. Pelz-Stelinski
    Scientific Reports, 12
  • [47] De novo assembly of Dekkera bruxellensis: a multi technology approach using short and long-read sequencing and optical mapping
    Olsen, Remi-Andre
    Bunikis, Ignas
    Tiukova, Ievgeniia
    Holmberg, Kicki
    Lotstedt, Britta
    Pettersson, Olga Vinnere
    Passoth, Volkmar
    Kaller, Max
    Vezzi, Francesco
    GIGASCIENCE, 2015, 4
  • [48] Complete de novo assembly of Wolbachia endosymbiont of Diaphorinacitri Kuwayama (Hemiptera: Liviidae) using long-read genome sequencing
    Neupane, Surendra
    Bonilla, Sylvia, I
    Manalo, Andrew M.
    Pelz-Stelinski, Kirsten S.
    SCIENTIFIC REPORTS, 2022, 12 (01)
  • [49] De novo assembly of human genomes
    Ameur, Adam
    EUROPEAN JOURNAL OF HUMAN GENETICS, 2022, 30 (SUPPL 1) : 12 - 12
  • [50] Long-read genotyping with SLANG (Simple Long-read loci Assembly of Nanopore data for Genotyping)
    Dorfner, Marco
    Ott, Tankred
    Ott, Philipp
    Oberprieler, Christoph
    APPLICATIONS IN PLANT SCIENCES, 2022, 10 (03):