Benchmarking of next and third generation sequencing technologies and their associated algorithms for de novo genome assembly

被引:16
|
作者
Gavrielatos, Marios [1 ,2 ]
Kyriakidis, Konstantinos [3 ,4 ]
Spandidos, Demetrios A. [5 ]
Michalopoulos, Ioannis [1 ]
机构
[1] Acad Athens, Biomed Res Fdn, Ctr Syst Biol, 4 Soranou Efessiou, Athens 11527, Greece
[2] Univ Athens, Fac Biol, Dept Cell Biol & Biophys, Athens 15701, Greece
[3] Aristotle Univ Thessaloniki AUTh, Sch Pharm, Thessaloniki 54124, Greece
[4] Ctr Interdisciplinary Res & Innovat, Genom & Epigen Translat Res GENeTres, Thessaloniki 57001, Greece
[5] Univ Crete, Med Sch, Lab Clin Virol, Iraklion 71003, Greece
关键词
de novo genome assembly; next generation sequencing; third generation sequencing; genomics; benchmarking; bioinformatics; VARIATION DISCOVERY; NANOPORE; READS; ANNOTATION;
D O I
10.3892/mmr.2021.11890
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
Genome assemblers are computational tools for de novo genome assembly, based on a plenitude of primary sequencing data. The quality of genome assemblies is estimated by their contiguity and the occurrences of misassemblies (duplications, deletions, translocations or inversions). The rapid development of sequencing technologies has enabled the rise of novel de novo genome assembly strategies. The ultimate goal of such strategies is to utilise the features of each sequencing platform in order to address the existing weaknesses of each sequencing type and compose a complete and correct genome map. In the present study, the hybrid strategy, which is based on Illumina short paired-end reads and Nanopore long reads, was benchmarked using MaSuRCA and Wengan assemblers. Moreover, the long-read assembly strategy, which is based on Nanopore reads, was benchmarked using Canu or PacBio HiFi reads were benchmarked using Hifiasm and HiCanu. The assemblies were performed on a computational cluster with limited computational resources. Their outputs were evaluated in terms of accuracy and computational performance. PacBio HiFi assembly strategy outperforms the other ones, while Hi-C scaffolding, which is based on chromatin 3D structure, is required in order to increase continuity, accuracy and completeness when large and complex genomes, such as the human one, are assembled. The use of Hi-C data is also necessary while using the hybrid assembly strategy. The results revealed that HiFi sequencing enabled the rise of novel algorithms which require less genome coverage than that of the other strategies making the assembly a less computationally demanding task. Taken together, these developments may lead to the democratisation of genome assembly projects which are now approachable by smaller labs with limited technical and financial resources.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Next generation sequencing under de novo genome assembly
    Nimmy, Sonia Farhana
    Kamal, M. S.
    [J]. INTERNATIONAL JOURNAL OF BIOMATHEMATICS, 2015, 8 (05)
  • [2] De novo genome assembly for third generation sequencing data
    Forc, Mateusz
    Kusmirek, Wiktor
    Nowak, Robert M.
    [J]. PHOTONICS APPLICATIONS IN ASTRONOMY, COMMUNICATIONS, INDUSTRY, AND HIGH-ENERGY PHYSICS EXPERIMENTS 2018, 2018, 10808
  • [3] Next generation shotgun sequencing and the challenges of de novo genome assembly
    Schlebusch, Stephen
    Illing, Nicola
    [J]. SOUTH AFRICAN JOURNAL OF SCIENCE, 2012, 108 (11-12) : 37 - 44
  • [4] NEXT-GENERATION DNA SEQUENCING FOR DE NOVO GENOME ASSEMBLY
    Hiatt, J.
    Turner, E.
    Patwardhan, R.
    Lee, C.
    Shendure, J.
    [J]. JOURNAL OF INVESTIGATIVE MEDICINE, 2009, 57 (01) : 114 - 114
  • [5] A Practical Comparison of De Novo Genome Assembly Software Tools for Next-Generation Sequencing Technologies
    Zhang, Wenyu
    Chen, Jiajia
    Yang, Yang
    Tang, Yifei
    Shang, Jing
    Shen, Bairong
    [J]. PLOS ONE, 2011, 6 (03):
  • [6] Comparative studies of de novo assembly tools for next-generation sequencing technologies
    Lin, Yong
    Li, Jian
    Shen, Hui
    Zhang, Lei
    Papasian, Christopher J.
    Deng, Hong-Wen
    [J]. BIOINFORMATICS, 2011, 27 (15) : 2031 - 2037
  • [7] De Novo Assembly Methods for Next Generation Sequencing Data
    He, Yiming
    Zhang, Zhen
    Peng, Xiaoqing
    Wu, Fangxiang
    Wang, Jianxin
    [J]. TSINGHUA SCIENCE AND TECHNOLOGY, 2013, 18 (05) : 500 - 514
  • [8] De Novo Assembly Methods for Next Generation Sequencing Data
    Yiming He
    Zhen Zhang
    Xiaoqing Peng
    Fangxiang Wu
    Jianxin Wang
    [J]. Tsinghua Science and Technology, 2013, 18 (05) : 500 - 514
  • [9] Effects of GC Bias in Next-Generation-Sequencing Data on De Novo Genome Assembly
    Chen, Yen-Chun
    Liu, Tsunglin
    Yu, Chun-Hui
    Chiang, Tzen-Yuh
    Hwang, Chi-Chuan
    [J]. PLOS ONE, 2013, 8 (04):
  • [10] Efficient data structures for mobile de novo genome assembly by third-generation sequencing
    Milicchio, Franco
    Prosperi, Mattia
    [J]. 14TH INTERNATIONAL CONFERENCE ON MOBILE SYSTEMS AND PERVASIVE COMPUTING (MOBISPC 2017) / 12TH INTERNATIONAL CONFERENCE ON FUTURE NETWORKS AND COMMUNICATIONS (FNC 2017) / AFFILIATED WORKSHOPS, 2017, 110 : 440 - 447