Comparative Evaluation of Genome Assemblers from Long-Read Sequencing for Plants and Crops

被引:17
|
作者
Jung, Hyungtaek [1 ]
Jeon, Min-Seung [2 ]
Hodgett, Matthew [3 ]
Waterhouse, Peter [1 ]
Eyun, Seong-il [2 ]
机构
[1] Queensland Univ Technol, Ctr Agr & Biocommod, Brisbane, Qld 4001, Australia
[2] Chung Ang Univ, Dept Life Sci, Seoul 06974, South Korea
[3] Queensland Univ Technol, Informat Technol Serv, Brisbane, Qld 4001, Australia
基金
澳大利亚研究理事会;
关键词
plant genome; next-generation sequencing; Pacific Biosciences; long reads; nanopore; assemblers;
D O I
10.1021/acs.jafc.0c01647
中图分类号
S [农业科学];
学科分类号
09 ;
摘要
The availability of recent state-of-the-art long-read sequencing technologies has significantly increased the ease and speed of producing high-quality plant genome assemblies. A wide variety of genome-related software tools are now available and they are typically benchmarked using microbial or model eukaryotic genomes such as Arabidopsis and rice. However, many plant species have much larger and more complex genomes than these, and the choice of tools, parameters, and/or strategies that can be used is not always obvious. Thus, we have compared the metrics of assemblies generated by various pipelines to discuss how assembly quality can be affected by two different assembly strategies. First, we focused on optimizing read preprocessing and assembler variables using eight different de novo assemblers on five different Pacific Biosciences long-read datasets of diploid and tetraploid species. Then, we examined a single scaffolding tool (quickmerge) that has been employed for the postprocessing step. We then merged the outputs from multiple assemblies to produce a higher quality consensus assembly. Then, we benchmarked the assemblies for completeness and accuracy (assembly metrics and BUSCO), computer memory, and CPU times. Two lightweight assemblers, Miniasm/Minimap/Racon and WTDBG, were deemed good for novice users because they involved smaller required learning curves and light computational resources. However, two heavyweight tools, CANU and Flye, should be the first choice when the goal is to achieve accurate and complete assemblies. Our results will provide valuable guidance in future plant genome projects and beyond.
引用
下载
收藏
页码:7670 / 7677
页数:8
相关论文
共 50 条
  • [31] Long read genome assemblers struggle with small plasmids
    Johnson, Jared
    Soehnlen, Marty
    Blankenship, Heather M.
    MICROBIAL GENOMICS, 2023, 9 (05):
  • [32] Method of the year: long-read sequencing
    Marx, Vivien
    NATURE METHODS, 2023, 20 (01) : 6 - 11
  • [33] Long-read sequencing in fungal identification
    Hoang, Minh Thuy Vi
    Irinyi, Laszlo
    Meyer, Wieland
    MICROBIOLOGY AUSTRALIA, 2022, 43 (01) : 14 - 18
  • [34] Reimagining Long-Read DNA Sequencing
    不详
    CHEMICAL ENGINEERING PROGRESS, 2017, 113 (10) : 28 - 28
  • [35] Genomics in the long-read sequencing era
    van Dijk, Erwin L.
    Naquin, Delphine
    Gorrichon, Kevin
    Jaszczyszyn, Yan
    Ouazahrou, Rania
    Thermes, Claude
    Hernandez, Celine
    TRENDS IN GENETICS, 2023, 39 (09) : 649 - 671
  • [36] Long-read sequencing in human genetics
    Kraft, Florian
    Kurth, Ingo
    MEDIZINISCHE GENETIK, 2019, 31 (02) : 198 - 204
  • [37] Long-read sequencing for brain tumors
    Shelton, William J.
    Zandpazandi, Sara
    Nix, J. Stephen
    Gokden, Murat
    Bauer, Michael
    Ryan, Katie Rose
    Wardell, Christopher P.
    Vaske, Olena Morozova
    Rodriguez, Analiz
    FRONTIERS IN ONCOLOGY, 2024, 14
  • [38] Long-read sequencing goes clinical
    Neveling, K.
    Derks, R.
    Kwint, M.
    van de Vorst, M.
    Gardeitchik, T.
    Nelen, M.
    EUROPEAN JOURNAL OF HUMAN GENETICS, 2019, 27 : 516 - 516
  • [39] Method of the year: long-read sequencing
    Vivien Marx
    Nature Methods, 2023, 20 : 6 - 11
  • [40] Nanopore long-read sequencing of circRNAs
    Rahimi, Karim
    Nielsen, Anne Faerch
    Veno, Morten T.
    Kjems, Jorgen
    METHODS, 2021, 196 : 23 - 29