TrAnnoScope: A Modular Snakemake Pipeline for Full-Length Transcriptome Analysis and Functional Annotation

被引:0
|
作者
Pektas, Aysevil [1 ]
Panitz, Frank [1 ,2 ]
Thomsen, Bo [1 ]
机构
[1] Aarhus Univ, Dept Mol Biol & Genet, DK-8000 Aarhus, Denmark
[2] Nat Resources Inst Finland Luke, Appl Stat Methods, Turku 20520, Finland
关键词
RNA-Seq; reproducible pipeline; high-performance computing (HPC); transcriptome analysis; functional annotation; Iso-Seq; snakemake; long-read sequencing; PROTEIN; DATABASE; MODEL;
D O I
10.3390/genes15121547
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Background/Objectives: Transcriptome assembly and functional annotation are essential in understanding gene expression and biological function. Nevertheless, many existing pipelines lack the flexibility to integrate both short- and long-read sequencing data or fail to provide a complete, customizable workflow for transcriptome analysis, particularly for non-model organisms. Methods: We present TrAnnoScope, a transcriptome analysis pipeline designed to process Illumina short-read and PacBio long-read data. The pipeline provides a complete, customizable workflow to generate high-quality, full-length (FL) transcripts with broad functional annotation. Its modular design allows users to adapt specific analysis steps for other sequencing platforms or data types. The pipeline encompasses steps from quality control to functional annotation, employing tools and established databases such as SwissProt, Pfam, Gene Ontology (GO), the Kyoto Encyclopedia of Genes and Genomes (KEGG), and Eukaryotic Orthologous Groups (KOG). As a case study, TrAnnoScope was applied to RNA-Seq and Iso-Seq data from zebra finch brain, ovary, and testis tissue. Results: The zebra finch transcriptome generated by TrAnnoScope from the brain, ovary, and testis tissue demonstrated strong alignment with the reference genome (99.63%), and it was found that 93.95% of the matched protein sequences in the zebra finch proteome were captured as nearly complete. Functional annotation provided matches to known protein databases and assigned relevant functional terms to the majority of the transcripts. Conclusions: TrAnnoScope successfully integrates short and long sequencing technologies to generate transcriptomes with minimal user input. Its modularity and ease of use make it a valuable tool for researchers analyzing complex datasets, particularly for non-model organisms.
引用
收藏
页数:15
相关论文
共 50 条
  • [21] Full-Length Transcriptome Characterization and Comparative Analysis of Chosenia arbutifolia
    He, Xudong
    Wang, Yu
    Zheng, Jiwei
    Zhou, Jie
    Jiao, Zhongyi
    Wang, Baosong
    Zhuge, Qiang
    FORESTS, 2022, 13 (04):
  • [22] Dataset of full-length transcriptome assembly and annotation of apocynum venetum using pacbio sequel II
    Zhang, Tingting
    Li, Mao
    Zhan, Ya Guang
    Fan, Gui Zhi
    DATA IN BRIEF, 2020, 33
  • [23] Reconstruction and functional annotation of Ascosphaera apis full-length transcriptome utilizing PacBio long reads combined with Illumina short reads
    Chen, Dafu
    Du, Yu
    Fan, Xiaoxue
    Zhu, Zhiwei
    Jiang, Haibin
    Wang, Jie
    Fan, Yuanchan
    Chen, Huazhi
    Zhou, Dingding
    Xiong, Cuiling
    Zheng, Yanzhen
    Xu, Xijian
    Luo, Qun
    Guo, Rui
    JOURNAL OF INVERTEBRATE PATHOLOGY, 2020, 176
  • [24] Realizing the potential of full-length transcriptome sequencing
    Byrne, Ashley
    Cole, Charles
    Volden, Roger
    Vollmers, Christopher
    PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES, 2019, 374 (1786)
  • [25] Full-length transcriptome sequencing on PacBio platform
    Ren Y.
    Zhang J.
    Sun Y.
    Wu Z.
    Ruan J.
    He B.
    Liu G.
    Gao S.
    Bu W.
    Gao, Shan (gao_shan@mail.nankai.edu.cn), 1600, Chinese Academy of Sciences (61): : 1250 - 1254
  • [26] Construction and functional annotation of a full-length cDNA library of Thellungiella halophila, a model halophyte
    Taji, Teruaki
    Sakurai, Tetsuya
    Seki, Motoaki
    Sakata, Yoich
    Tanaka, Shigeo
    Shinozaki, Kazuo
    PLANT AND CELL PHYSIOLOGY, 2007, 48 : S139 - S139
  • [27] Full-Length Transcriptome Analysis of Four Different Tissues of Cephalotaxus oliveri
    He, Ziqing
    Su, Yingjuan
    Wang, Ting
    INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2021, 22 (02) : 1 - 26
  • [28] Full-length transcriptome analysis of Zanthoxylum nitidum (Roxb.) DC
    Zhu, Yanxia
    Huang, Yanfen
    Wei, Kunhua
    Yu, Junnan
    Jiang, Jianping
    PEERJ COMPUTER SCIENCE, 2023, 11
  • [29] Full-length transcriptome combined with RNA sequence analysis of Fraxinus chinensis
    Sun, Xiaochun
    Li, Huirong
    GENES & GENOMICS, 2023, 45 (05) : 553 - 567
  • [30] Full-length transcriptome combined with RNA sequence analysis of Fraxinus chinensis
    Xiaochun Sun
    Huirong Li
    Genes & Genomics, 2023, 45 : 553 - 567