Comparative Evaluation of Genome Assemblers from Long-Read Sequencing for Plants and Crops

被引:17
|
作者
Jung, Hyungtaek [1 ]
Jeon, Min-Seung [2 ]
Hodgett, Matthew [3 ]
Waterhouse, Peter [1 ]
Eyun, Seong-il [2 ]
机构
[1] Queensland Univ Technol, Ctr Agr & Biocommod, Brisbane, Qld 4001, Australia
[2] Chung Ang Univ, Dept Life Sci, Seoul 06974, South Korea
[3] Queensland Univ Technol, Informat Technol Serv, Brisbane, Qld 4001, Australia
基金
澳大利亚研究理事会;
关键词
plant genome; next-generation sequencing; Pacific Biosciences; long reads; nanopore; assemblers;
D O I
10.1021/acs.jafc.0c01647
中图分类号
S [农业科学];
学科分类号
09 ;
摘要
The availability of recent state-of-the-art long-read sequencing technologies has significantly increased the ease and speed of producing high-quality plant genome assemblies. A wide variety of genome-related software tools are now available and they are typically benchmarked using microbial or model eukaryotic genomes such as Arabidopsis and rice. However, many plant species have much larger and more complex genomes than these, and the choice of tools, parameters, and/or strategies that can be used is not always obvious. Thus, we have compared the metrics of assemblies generated by various pipelines to discuss how assembly quality can be affected by two different assembly strategies. First, we focused on optimizing read preprocessing and assembler variables using eight different de novo assemblers on five different Pacific Biosciences long-read datasets of diploid and tetraploid species. Then, we examined a single scaffolding tool (quickmerge) that has been employed for the postprocessing step. We then merged the outputs from multiple assemblies to produce a higher quality consensus assembly. Then, we benchmarked the assemblies for completeness and accuracy (assembly metrics and BUSCO), computer memory, and CPU times. Two lightweight assemblers, Miniasm/Minimap/Racon and WTDBG, were deemed good for novice users because they involved smaller required learning curves and light computational resources. However, two heavyweight tools, CANU and Flye, should be the first choice when the goal is to achieve accurate and complete assemblies. Our results will provide valuable guidance in future plant genome projects and beyond.
引用
下载
收藏
页码:7670 / 7677
页数:8
相关论文
共 50 条
  • [21] Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data
    Chin, Chen-Shan
    Alexander, David H.
    Marks, Patrick
    Klammer, Aaron A.
    Drake, James
    Heiner, Cheryl
    Clum, Alicia
    Copeland, Alex
    Huddleston, John
    Eichler, Evan E.
    Turner, Stephen W.
    Korlach, Jonas
    NATURE METHODS, 2013, 10 (06) : 563 - +
  • [22] Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data
    Chin C.-S.
    Alexander D.H.
    Marks P.
    Klammer A.A.
    Drake J.
    Heiner C.
    Clum A.
    Copeland A.
    Huddleston J.
    Eichler E.E.
    Turner S.W.
    Korlach J.
    Nature Methods, 2013, 10 (6) : 563 - 569
  • [23] Long-read sequencing uncovers the adaptive topography of a carnivorous plant genome
    Lan, Tianying
    Renner, Tanya
    Ibarra-Laclette, Enrique
    Farr, Kimberly M.
    Chang, Tien-Hao
    Alan Cervantes-Perez, Sergio
    Zheng, Chunfang
    Sankoff, David
    Tang, Haibao
    Purbojati, Rikky W.
    Putra, Alexander
    Drautz-Moses, Daniela I.
    Schuster, Stephan C.
    Herrera-Estrella, Luis
    Albert, Victor A.
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2017, 114 (22) : E4435 - E4441
  • [24] Long-read whole-genome sequencing for the genetic diagnosis of dystrophinopathies
    Xie, Zhiying
    Sun, Chengyue
    Zhang, Siwen
    Liu, Yilin
    Yu, Meng
    Zheng, Yiming
    Meng, Lingchao
    Acharya, Anushree
    Cornejo-Sanchez, Diana M.
    Wang, Gao
    Zhang, Wei
    Schrauwen, Isabelle
    Leal, Suzanne M.
    Wang, Zhaoxia
    Yuan, Yun
    ANNALS OF CLINICAL AND TRANSLATIONAL NEUROLOGY, 2020, 7 (10): : 2041 - 2046
  • [25] Long-read genome sequencing informs the molecular etiology of imprinting disorders
    Dixon, Katherine
    Shen, Yaoqing
    Chin, Hui-Lin
    Gazzaz, Nour
    Huynh, Stephanie
    Chan, Simon
    Zhang, Cathy
    Culibrk, Luka
    O'Neill, Kieran
    Mungall, Karen
    Mungall, Andrew
    Moore, Richard
    Gibson, William
    Chanoine, Jean-Pierre
    Boerkoel, Cornelius
    Jones, Steven
    GENETICS IN MEDICINE, 2022, 24 (03) : S214 - S215
  • [26] Long-read sequencing and de novo assembly of the cynomolgus macaque genome
    Bai, Bing
    Wang, Yi
    Zhu, Ran
    Zhang, Yaolei
    Wang, Hong
    Fan, Guangyi
    Liu, Xin
    Shi, Hong
    Niu, Yuyu
    Ji, Weizhi
    JOURNAL OF GENETICS AND GENOMICS, 2022, 49 (10) : 975 - 978
  • [27] Whole Genome Assembly of Human Papillomavirus by Nanopore Long-Read Sequencing
    Yang, Shuaibing
    Zhao, Qianqian
    Tang, Lihua
    Chen, Zejia
    Wu, Zhaoting
    Li, Kaixin
    Lin, Ruoru
    Chen, Yang
    Ou, Danlin
    Zhou, Li
    Xu, Jianzhen
    Qin, Qingsong
    FRONTIERS IN GENETICS, 2022, 12
  • [28] Long-read sequencing and de novo assembly of the cynomolgus macaque genome
    Bing Bai
    Yi Wang
    Ran Zhu
    Yaolei Zhang
    Hong Wang
    Guangyi Fan
    Xin Liu
    Hong Shi
    Yuyu Niu
    Weizhi Ji
    Journal of Genetics and Genomics, 2022, 49 (10) : 975 - 978
  • [29] Improved contiguity of the threespine stickleback genome using long-read sequencing
    Nath, Shivangi
    Shaw, Daniel E.
    White, Michael A.
    G3-GENES GENOMES GENETICS, 2021, 11 (02):
  • [30] Long-read whole genome sequencing and comparative analysis of six strains of the human pathogen Orientia tsutsugamushi
    Batty, Elizabeth M.
    Chaemchuen, Suwittra
    Blacksell, Stuart
    Richards, Allen L.
    Paris, Daniel
    Bowden, Rory
    Chan, Caroline
    Lachumanan, Ramkumar
    Day, Nicholas
    Donnelly, Peter
    Chen, Swaine
    Salje, Jeanne
    PLOS NEGLECTED TROPICAL DISEASES, 2018, 12 (06):