A pipeline for the de novo assembly of the Themira biloba (Sepsidae: Diptera) transcriptome using a multiple k-mer length approach

被引:13
|
作者
Melicher, Dacotah [1 ]
Torson, Alex S. [1 ]
Dworkin, Ian [2 ]
Bowsher, Julia H. [1 ]
机构
[1] N Dakota State Univ, Dept Biol Sci, Fargo, ND 58102 USA
[2] Michigan State Univ, Dept Zool, E Lansing, MI 48823 USA
来源
BMC GENOMICS | 2014年 / 15卷
基金
美国国家科学基金会;
关键词
Multiple k-mer; de novo assembly; Sepsidae; Transcriptome; Pipeline; Cloud computing; RNA-SEQ DATA; GENE-EXPRESSION; FLIES DIPTERA; ABDOMINAL APPENDAGES; SEXUAL SELECTION; FLY; EVOLUTION; BEHAVIOR; FLYBASE; GALAXY;
D O I
10.1186/1471-2164-15-188
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: The Sepsidae family of flies is a model for investigating how sexual selection shapes courtship and sexual dimorphism in a comparative framework. However, like many non-model systems, there are few molecular resources available. Large-scale sequencing and assembly have not been performed in any sepsid, and the lack of a closely related genome makes investigation of gene expression challenging. Our goal was to develop an automated pipeline for de novo transcriptome assembly, and to use that pipeline to assemble and analyze the transcriptome of the sepsid Themira biloba. Results: Our bioinformatics pipeline uses cloud computing services to assemble and analyze the transcriptome with off-site data management, processing, and backup. It uses a multiple k-mer length approach combined with a second meta-assembly to extend transcripts and recover more bases of transcript sequences than standard single k-mer assembly. We used 454 sequencing to generate 1.48 million reads from cDNA generated from embryo, larva, and pupae of T. biloba and assembled a transcriptome consisting of 24,495 contigs. Annotation identified 16,705 transcripts, including those involved in embryogenesis and limb patterning. We assembled transcriptomes from an additional three non-model organisms to demonstrate that our pipeline assembled a higher-quality transcriptome than single k-mer approaches across multiple species. Conclusions: The pipeline we have developed for assembly and analysis increases contig length, recovers unique transcripts, and assembles more base pairs than other methods through the use of a meta-assembly. The T. biloba transcriptome is a critical resource for performing large-scale RNA-Seq investigations of gene expression patterns, and is the first transcriptome sequenced in this Dipteran family.
引用
收藏
页数:13
相关论文
共 13 条
  • [1] A pipeline for the de novo assembly of the Themira biloba(Sepsidae: Diptera) transcriptome using a multiple k-mer length approach
    Dacotah Melicher
    Alex S Torson
    Ian Dworkin
    Julia H Bowsher
    BMC Genomics, 15
  • [2] Comparison of De Novo Transcriptome Assemblers and k-mer Strategies Using the Killifish, Fundulus heteroclitus
    Rana, Satshil B.
    Zadlock, Frank J.
    Zhang, Ziping
    Murphy, Wyatt R.
    Bentivegna, Carolyn S.
    PLOS ONE, 2016, 11 (04):
  • [3] Optimizing k-mer size using a variant grid search to enhance de novo genome assembly
    Cha, Soyeon
    Bird, David McK
    BIOINFORMATION, 2016, 12 (02) : 36 - 40
  • [4] Complete Taiwanese Macaque (Macaca cyclopis) Mitochondrial Genome: Reference-Assisted de novo Assembly with Multiple k-mer Strategy
    Huang, Yu-Feng
    Midha, Mohit
    Chen, Tzu-Han
    Wang, Yu-Tai
    Smith, David Glenn
    Pei, Kurtis Jai-Chyi
    Chiu, Kuo Ping
    PLOS ONE, 2015, 10 (06):
  • [5] Construction of a de novo assembly pipeline using multiple transcriptome data sets from Cypripedium macranthos (Orchidaceae)
    Kambara, Kota
    Fujino, Kaien
    Shimura, Hanako
    PLOS ONE, 2023, 18 (06):
  • [6] Viral Phylogenomics Using an Alignment-Free Method: A Three-Step Approach to Determine Optimal Length of k-mer
    Zhang, Qian
    Jun, Se-Ran
    Leuze, Michael
    Ussery, David
    Nookaew, Intawat
    SCIENTIFIC REPORTS, 2017, 7
  • [7] Viral Phylogenomics Using an Alignment-Free Method: A Three-Step Approach to Determine Optimal Length of k-mer
    Qian Zhang
    Se-Ran Jun
    Michael Leuze
    David Ussery
    Intawat Nookaew
    Scientific Reports, 7
  • [8] A general near-exact k-mer counting method with low memory consumption enables de novo assembly of 106x human sequence data in 2.7 hours
    Shi, Christina Huan
    Yip, Kevin Y.
    BIOINFORMATICS, 2020, 36 : I625 - I633
  • [9] De Novo Assembly of Complete Chloroplast Genomes from Non-model Species Based on a K-mer Frequency-Based Selection of Chloroplast Reads from Total DNA Sequences
    Izan, Shairul
    Esselink, Danny
    Visser, Richard G. F.
    Smulders, Marinus J. M.
    Borm, Theo
    FRONTIERS IN PLANT SCIENCE, 2017, 8
  • [10] De novo transcriptome assembly of Iphiculus spongiosus Adams & White, 1849 (Decapoda; Brachyura; Leucosioidea) using full-length isoform sequencing
    Shih, Yi-Jia
    Yang, Yin-Ming
    Chiu, Tsan-Yu
    Chu, Ta -Jen
    REGIONAL STUDIES IN MARINE SCIENCE, 2023, 62