Assembly of chloroplast genomes with long- and short-read data: a comparison of approaches using Eucalyptus pauciflora as a test case

被引:45
|
作者
Wang, Weiwen [1 ]
Schalamun, Miriam [1 ,2 ]
Morales-Suarez, Alejandro [3 ]
Kainer, David [1 ]
Schwessinger, Benjamin [1 ]
Lanfear, Robert [1 ]
机构
[1] Australian Natl Univ, Res Sch Biol, Canberra, ACT, Australia
[2] Univ Nat Resources & Life Sci, Inst Appl Genet & Cell Biol, Vienna, Austria
[3] Macquarie Univ, Dept Biol Sci, Sydney, NSW, Australia
来源
BMC GENOMICS | 2018年 / 19卷
基金
澳大利亚研究理事会;
关键词
Chloroplast genome; Genome assembly; Polishing; Illumina; Long-reads; Nanopore; HIGH-THROUGHPUT; PLASTID GENOME; DNA INSERTIONS; PHYLOGENY; MITOCHONDRIAL; EVOLUTION; SEQUENCE; ORGANIZATION; GENERATION; VERSATILE;
D O I
10.1186/s12864-018-5348-8
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
BackgroundChloroplasts are organelles that conduct photosynthesis in plant and algal cells. The information chloroplast genome contained is widely used in agriculture and studies of evolution and ecology. Correctly assembling chloroplast genomes can be challenging because the chloroplast genome contains a pair of long inverted repeats (10-30kb). Typically, it is simply assumed that the gross structure of the chloroplast genome matches the most commonly observed structure of two single-copy regions separated by a pair of inverted repeats. The advent of long-read sequencing technologies should remove the need to make this assumption by providing sufficient information to completely span the inverted repeat regions. Yet, long-reads tend to have higher error rates than short-reads, and relatively little is known about the best way to combine long- and short-reads to obtain the most accurate chloroplast genome assemblies. Using Eucalyptus pauciflora, the snow gum, as a test case, we evaluated the effect of multiple parameters, such as different coverage of long-(Oxford nanopore) and short-(Illumina) reads, different long-read lengths, different assembly pipelines, with a view to determining the most accurate and efficient approach to chloroplast genome assembly.ResultsHybrid assemblies combining at least 20x coverage of both long-reads and short-reads generated a single contig spanning the entire chloroplast genome with few or no detectable errors. Short-read-only assemblies generated three contigs (the long single copy, short single copy and inverted repeat regions) of the chloroplast genome. These contigs contained few single-base errors but tended to exclude several bases at the beginning or end of each contig. Long-read-only assemblies tended to create multiple contigs with a much higher single-base error rate. The chloroplast genome of Eucalyptus pauciflora is 159,942bp, contains 131 genes of known function.ConclusionsOur results suggest that very accurate assemblies of chloroplast genomes can be achieved using a combination of at least 20x coverage of long- and short-reads respectively, provided that the long-reads contain at least similar to 5x coverage of reads longer than the inverted repeat region. We show that further increases in coverage give little or no improvement in accuracy, and that hybrid assemblies are more accurate than long-read-only or short-read-only assemblies.
引用
收藏
页数:15
相关论文
共 38 条
  • [31] Chromosome-level genome assembly using both long-read and short-read sequencing and structural variant analysis of two yeast strains from the Peterhof genetic collection
    Matveenko, A.
    Barbitoff, Y.
    Matiiv, A.
    Maksiutenko, E.
    Drozdova, P.
    Moskalenko, S.
    Polev, D.
    Beliavskaia, A.
    Danilov, L.
    Predeus, A.
    Zhouravleva, G.
    FEBS OPEN BIO, 2021, 11 : 117 - 118
  • [32] Detection of extended-spectrum beta-lactamase (ESBL) genes and plasmid replicons in Enterobacteriaceae using PlasmidSPAdes assembly of short-read sequence data
    Stohr, Joep J. J. M.
    Kluytmans-van den Bergh, Marjolein F. Q.
    Wedema, Ronald
    Friedrich, Alexander W.
    Kluytmans, Jan A. J. W.
    Rossen, John W. A.
    MICROBIAL GENOMICS, 2020, 6 (07): : 6 - 12
  • [33] Assessment of human reference genomes on cancer somatic mutation detection in tumor-normal paired reference samples using whole genome short-read sequencing data
    Xiao, Chunlin
    Schneider, Valerie
    CANCER RESEARCH, 2023, 83 (07)
  • [34] Hybrid De Novo Genome Assembly for the Generation of Complete Genomes of Urinary Bacteria using Short- and Long-read Sequencing Technologies
    Sharon, Belle M.
    Hulyalkar, Neha, V
    Nguyen, Vivian H.
    Zimmern, Philippe E.
    Palmer, Kelli L.
    De Nisco, Nicole J.
    JOVE-JOURNAL OF VISUALIZED EXPERIMENTS, 2021, (174):
  • [35] TrEMOLO: accurate transposable element allele frequency estimation using long-read sequencing data combining assembly and mapping-based approaches
    Mourdas Mohamed
    François Sabot
    Marion Varoqui
    Bruno Mugat
    Katell Audouin
    Alain Pélisson
    Anna-Sophie Fiston-Lavier
    Séverine Chambeyron
    Genome Biology, 24
  • [36] TrEMOLO: accurate transposable element allele frequency estimation using long-read sequencing data combining assembly and mapping-based approaches
    Mohamed, Mourdas
    Sabot, Francois
    Varoqui, Marion
    Mugat, Bruno
    Audouin, Katell
    Pelisson, Alain
    Fiston-Lavier, Anna-Sophie
    Chambeyron, Severine
    GENOME BIOLOGY, 2023, 24 (01)
  • [37] Specific long- and short-term memory deficits producing dyscalculia in a physicist: A single case study carried out using the Sao Paulo MAT test
    DosSantos, CLNG
    Nakamura, A
    Rosa, ATF
    BRAIN AND COGNITION, 1996, 32 (02) : 325 - 325
  • [38] IMA genome-F17 Draft genome sequences of an Armillaria species from Zimbabwe, Ceratocystis colombiana, Elsinoe necatrix, Rosellinia necatrix, two genomes of Sclerotinia minor, short-read genome assemblies and annotations of four Pyrenophora teres isolates from barley grass, and a long-read genome assembly of Cercospora zeina
    Wingfield, Brenda D.
    Berger, Dave K.
    Coetzee, Martin P. A.
    Duong, Tuan A.
    Martin, Anke
    Pham, Nam Q.
    van den Berg, Noelani
    Wilken, P. Markus
    Arun-Chinnappa, Kiruba Shankari
    Barnes, Irene
    Buthelezi, Sikelela
    Dahanayaka, Buddhika Amarasinghe
    Duran, Alvaro
    Engelbrecht, Juanita
    Feurtey, Alice
    Fourie, Arista
    Fourie, Gerda
    Hartley, Jesse
    Kabwe, Eugene N. K.
    Maphosa, Mkhululi
    Mensah, Deborah L. Narh
    Nsibo, David L.
    Potgieter, Lizel
    Poudel, Barsha
    Stukenbrock, Eva H.
    Thomas, Chanel
    Vaghefi, Niloofar
    Welgemoed, Tanya
    Wingfield, Michael J.
    IMA FUNGUS, 2022, 13 (01)