De Novo Assembly of Complete Chloroplast Genomes from Non-model Species Based on a K-mer Frequency-Based Selection of Chloroplast Reads from Total DNA Sequences

被引:11
|
作者
Izan, Shairul [1 ,2 ]
Esselink, Danny [1 ]
Visser, Richard G. F. [1 ]
Smulders, Marinus J. M. [1 ]
Borm, Theo [1 ]
机构
[1] Wageningen Univ & Res, Plant Breeding, Wageningen, Netherlands
[2] Univ Putra Malaysia, Dept Crop Sci, Fac Agr, Serdang, Malaysia
来源
关键词
chloroplast genome; de novo assembly; Solanum; Aegilops; Paphiopedilum; DNA sequencing; whole genome shotgun sequencing; k-mer analysis; PLASTID GENOMES; PHYLOGENOMICS; ANGIOSPERMS; EVOLUTION; RESOLVE;
D O I
10.3389/fpls.2017.01271
中图分类号
Q94 [植物学];
学科分类号
071001 ;
摘要
Whole Genome Shotgun (WGS) sequences of plant species often contain an abundance of reads that are derived from the chloroplast genome. Up to now these reads have generally been identified and assembled into chloroplast genomes based on homology to chloroplasts from related species. This re-sequencing approach may select against structural differences between the genomes especially in non-model species for which no close relatives have been sequenced before. The alternative approach is to de novo assemble the chloroplast genome from total genomic DNA sequences. In this study, we used k-mer frequency tables to identify and extract the chloroplast reads from the WGS reads and assemble these using a highly integrated and automated custom pipeline. Our strategy includes steps aimed at optimizing assemblies and filling gaps which are left due to coverage variation in the WGS dataset. We have successfully de novo assembled three complete chloroplast genomes from plant species with a range of nuclear genome sizes to demonstrate the universality of our approach: Solanum lycopersicum (0.9 Gb), Aegilops tauschii (4 Gb) and Paphiopedilum henryanum (25 Gb). We also highlight the need to optimize the choice of k and the amount of data used. This new and cost-effective method for de novo short read assembly will facilitate the study of complete chloroplast genomes with more accurate analyses and inferences, especially in nonmodel plant genomes.
引用
收藏
页数:13
相关论文
共 3 条
  • [1] Molecular phylogeny of the fern genus Elaphoglossum (Elaphoglossaceae) based on chloroplast non-coding DNA sequences:: contributions of species from the Indian Ocean area
    Rouhan, G
    Dubuisson, JY
    Rakotondrainibe, F
    Motley, TJ
    Mickel, JT
    Labat, JN
    Moran, RC
    MOLECULAR PHYLOGENETICS AND EVOLUTION, 2004, 33 (03) : 745 - 763
  • [2] Using k-mer embeddings learned from a Skip-gram based neural network for building a cross-species DNA N6-methyladenine site prediction model
    Trinh Trung Duong Nguyen
    Van Ngu Trinh
    Nguyen Quoc Khanh Le
    Yu-Yen Ou
    Plant Molecular Biology, 2021, 107 : 533 - 542
  • [3] Using k-mer embeddings learned from a Skip-gram based neural network for building a cross-species DNA N6-methyladenine site prediction model
    Trinh Trung Duong Nguyen
    Van Ngu Trinh
    Nguyen Quoc Khanh Le
    Ou, Yu-Yen
    PLANT MOLECULAR BIOLOGY, 2021, 107 (06) : 533 - 542