Assembly of chloroplast genomes with long- and short-read data: a comparison of approaches using Eucalyptus pauciflora as a test case

被引:45
|
作者
Wang, Weiwen [1 ]
Schalamun, Miriam [1 ,2 ]
Morales-Suarez, Alejandro [3 ]
Kainer, David [1 ]
Schwessinger, Benjamin [1 ]
Lanfear, Robert [1 ]
机构
[1] Australian Natl Univ, Res Sch Biol, Canberra, ACT, Australia
[2] Univ Nat Resources & Life Sci, Inst Appl Genet & Cell Biol, Vienna, Austria
[3] Macquarie Univ, Dept Biol Sci, Sydney, NSW, Australia
来源
BMC GENOMICS | 2018年 / 19卷
基金
澳大利亚研究理事会;
关键词
Chloroplast genome; Genome assembly; Polishing; Illumina; Long-reads; Nanopore; HIGH-THROUGHPUT; PLASTID GENOME; DNA INSERTIONS; PHYLOGENY; MITOCHONDRIAL; EVOLUTION; SEQUENCE; ORGANIZATION; GENERATION; VERSATILE;
D O I
10.1186/s12864-018-5348-8
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
BackgroundChloroplasts are organelles that conduct photosynthesis in plant and algal cells. The information chloroplast genome contained is widely used in agriculture and studies of evolution and ecology. Correctly assembling chloroplast genomes can be challenging because the chloroplast genome contains a pair of long inverted repeats (10-30kb). Typically, it is simply assumed that the gross structure of the chloroplast genome matches the most commonly observed structure of two single-copy regions separated by a pair of inverted repeats. The advent of long-read sequencing technologies should remove the need to make this assumption by providing sufficient information to completely span the inverted repeat regions. Yet, long-reads tend to have higher error rates than short-reads, and relatively little is known about the best way to combine long- and short-reads to obtain the most accurate chloroplast genome assemblies. Using Eucalyptus pauciflora, the snow gum, as a test case, we evaluated the effect of multiple parameters, such as different coverage of long-(Oxford nanopore) and short-(Illumina) reads, different long-read lengths, different assembly pipelines, with a view to determining the most accurate and efficient approach to chloroplast genome assembly.ResultsHybrid assemblies combining at least 20x coverage of both long-reads and short-reads generated a single contig spanning the entire chloroplast genome with few or no detectable errors. Short-read-only assemblies generated three contigs (the long single copy, short single copy and inverted repeat regions) of the chloroplast genome. These contigs contained few single-base errors but tended to exclude several bases at the beginning or end of each contig. Long-read-only assemblies tended to create multiple contigs with a much higher single-base error rate. The chloroplast genome of Eucalyptus pauciflora is 159,942bp, contains 131 genes of known function.ConclusionsOur results suggest that very accurate assemblies of chloroplast genomes can be achieved using a combination of at least 20x coverage of long- and short-reads respectively, provided that the long-reads contain at least similar to 5x coverage of reads longer than the inverted repeat region. We show that further increases in coverage give little or no improvement in accuracy, and that hybrid assemblies are more accurate than long-read-only or short-read-only assemblies.
引用
收藏
页数:15
相关论文
共 38 条
  • [21] Finding the right fit: evaluation of short- read and long- read sequencing approaches to maximize the utility of clinical microbiome data
    Gehrig, Jeanette L.
    Portik, Daniel M.
    Driscoll, Mark D.
    Jackson, Eric
    Chakraborty, Shreyasee
    Gratalo, Dawn
    Ashby, Meredith
    Valladares, Ricardo
    MICROBIAL GENOMICS, 2022, 8 (03):
  • [22] Optimizing de novo common wheat transcriptome assembly using short-read RNA-Seq data
    Duan, Jialei
    Xia, Chuan
    Zhao, Guangyao
    Jia, Jizeng
    Kong, Xiuying
    BMC GENOMICS, 2012, 13
  • [23] Optimizing de novo common wheat transcriptome assembly using short-read RNA-Seq data
    Jialei Duan
    Chuan Xia
    Guangyao Zhao
    Jizeng Jia
    Xiuying Kong
    BMC Genomics, 13
  • [24] Tandem repeat genotyping using massively parallel second generation sequencing: comparison of short-read and long-read technologies
    Radvanszky, Jan
    Lojova, Ingrid
    Kucharik, Marcel
    Balaz, Andrej
    Kvapilova, Katerina
    Kvapil, Petr
    Brzon, Ondrej
    Kasny, Martin
    Duranova, Terezia
    Forgacova, Natalia
    Hrnciar, Matej
    Holesova, Zuzana
    Martis, Jozef
    Sitarcik, Jozef
    Budis, Jaroslav
    Szemes, Tomas
    EUROPEAN JOURNAL OF HUMAN GENETICS, 2024, 32 : 1784 - 1785
  • [25] Evaluation of whole- genome sequence data analysis approaches for short- and long- read sequencing of Mycobacterium tuberculosis
    Peker, Nilay
    Schuele, Leonard
    Kok, Nienke
    Terrazos, Miguel
    Neuenschwander, Stefan M.
    de Beer, Jessica
    Akkerman, Onno
    Peter, Silke
    Ramette, Alban
    Merker, Matthias
    Niemann, Stefan
    Couto, Natacha
    Sinha, Bhanu
    Rossen, John W. A.
    MICROBIAL GENOMICS, 2021, 7 (11):
  • [26] Benchmarking long- and short-read somatic structural variation callers using a multi-technology panel of six tumor/normal cell lines
    Keskus, Ayse
    Bryant, Asher
    Ahmad, Tanveer
    Donmez, Ataberk
    Rodriguez, Isabel
    Rossi, Nicole
    Xie, Yi
    Yoo, Byunggil
    Milano, Rose
    Lou, Hong
    Park, Jimin
    Gardner, Joshua
    McNulty, Brandy
    Miga, Karen
    Dean, Mike
    Farooqi, Midhat
    Paten, Benedict
    Kolmogorov, Mikhail
    CANCER RESEARCH, 2024, 84 (06)
  • [27] Unveiling the Complexity of Red Clover (Trifolium pratense L.) Transcriptome and Transcriptional Regulation of Isoflavonoid Biosynthesis Using Integrated Long- and Short-Read RNAseq
    Shi, Kun
    Liu, Xiqiang
    Pan, Xinyi
    Liu, Jia
    Gong, Wenlong
    Gong, Pan
    Cao, Mingshu
    Jia, Shangang
    Wang, Zan
    INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2021, 22 (23)
  • [28] Methods for accurate quantification of LTR-retrotransposon copy number using short-read sequence data: a case study in Sorghum
    Dhanushya Ramachandran
    Jennifer S. Hawkins
    Molecular Genetics and Genomics, 2016, 291 : 1871 - 1883
  • [29] Methods for accurate quantification of LTR-retrotransposon copy number using short-read sequence data: a case study in Sorghum
    Ramachandran, Dhanushya
    Hawkins, Jennifer S.
    MOLECULAR GENETICS AND GENOMICS, 2016, 291 (05) : 1871 - 1883
  • [30] De novo assembly of transcriptomes and differential gene expression analysis using short-read data from emerging model organisms - a brief guide
    Jackson, Daniel J.
    Cerveau, Nicolas
    Posnien, Nico
    FRONTIERS IN ZOOLOGY, 2024, 21 (01):