TrAnnoScope: A Modular Snakemake Pipeline for Full-Length Transcriptome Analysis and Functional Annotation

被引:0
|
作者
Pektas, Aysevil [1 ]
Panitz, Frank [1 ,2 ]
Thomsen, Bo [1 ]
机构
[1] Aarhus Univ, Dept Mol Biol & Genet, DK-8000 Aarhus, Denmark
[2] Nat Resources Inst Finland Luke, Appl Stat Methods, Turku 20520, Finland
关键词
RNA-Seq; reproducible pipeline; high-performance computing (HPC); transcriptome analysis; functional annotation; Iso-Seq; snakemake; long-read sequencing; PROTEIN; DATABASE; MODEL;
D O I
10.3390/genes15121547
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Background/Objectives: Transcriptome assembly and functional annotation are essential in understanding gene expression and biological function. Nevertheless, many existing pipelines lack the flexibility to integrate both short- and long-read sequencing data or fail to provide a complete, customizable workflow for transcriptome analysis, particularly for non-model organisms. Methods: We present TrAnnoScope, a transcriptome analysis pipeline designed to process Illumina short-read and PacBio long-read data. The pipeline provides a complete, customizable workflow to generate high-quality, full-length (FL) transcripts with broad functional annotation. Its modular design allows users to adapt specific analysis steps for other sequencing platforms or data types. The pipeline encompasses steps from quality control to functional annotation, employing tools and established databases such as SwissProt, Pfam, Gene Ontology (GO), the Kyoto Encyclopedia of Genes and Genomes (KEGG), and Eukaryotic Orthologous Groups (KOG). As a case study, TrAnnoScope was applied to RNA-Seq and Iso-Seq data from zebra finch brain, ovary, and testis tissue. Results: The zebra finch transcriptome generated by TrAnnoScope from the brain, ovary, and testis tissue demonstrated strong alignment with the reference genome (99.63%), and it was found that 93.95% of the matched protein sequences in the zebra finch proteome were captured as nearly complete. Functional annotation provided matches to known protein databases and assigned relevant functional terms to the majority of the transcripts. Conclusions: TrAnnoScope successfully integrates short and long sequencing technologies to generate transcriptomes with minimal user input. Its modularity and ease of use make it a valuable tool for researchers analyzing complex datasets, particularly for non-model organisms.
引用
收藏
页数:15
相关论文
共 50 条
  • [2] Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs
    Okazaki, Y
    Furuno, M
    Kasukawa, T
    Adachi, J
    Bono, H
    Kondo, S
    Nikaido, I
    Osato, N
    Saito, R
    Suzuki, H
    Yamanaka, I
    Kiyosawa, H
    Yagi, K
    Tomaru, Y
    Hasegawa, Y
    Nogami, A
    Schönbach, C
    Gojobori, T
    Baldarelli, R
    Hill, DP
    Bult, C
    Hume, DA
    Quackenbush, J
    Schriml, LM
    Kanapin, A
    Matsuda, H
    Batalov, S
    Beisel, KW
    Blake, JA
    Bradt, D
    Brusic, V
    Chothia, C
    Corbani, LE
    Cousins, S
    Dalla, E
    Dragani, TA
    Fletcher, CF
    Forrest, A
    Frazer, KS
    Gaasterland, T
    Gariboldi, M
    Gissi, C
    Godzik, A
    Gough, J
    Grimmond, S
    Gustincich, S
    Hirokawa, N
    Jackson, IJ
    Jarvis, ED
    Kanai, A
    NATURE, 2002, 420 (6915) : 563 - 573
  • [3] transXpress: a Snakemake pipeline for streamlined de novo transcriptome assembly and annotation
    Timothy R. Fallon
    Tereza Čalounová
    Martin Mokrejš
    Jing-Ke Weng
    Tomáš Pluskal
    BMC Bioinformatics, 24
  • [4] transXpress: a Snakemake pipeline for streamlined de novo transcriptome assembly and annotation
    Fallon, Timothy R.
    Calounova, Tereza
    Mokrejs, Martin
    Weng, Jing-Ke
    Pluskal, Tomas
    BMC BIOINFORMATICS, 2023, 24 (01)
  • [5] Functional annotation of a full-length Arabidopsis cDNA collection
    Seki, M
    Narusaka, M
    Kamiya, A
    Ishida, J
    Satou, M
    Sakurai, T
    Nakajima, M
    Enju, A
    Akiyama, K
    Oono, Y
    Muramatsu, M
    Hayashizaki, Y
    Kawai, J
    Carninci, P
    Itoh, M
    Ishii, Y
    Arakawa, T
    Shibata, K
    Shinagawa, A
    Shinozaki, K
    SCIENCE, 2002, 296 (5565) : 141 - 145
  • [6] Functional annotation of a full-length mouse cDNA collection
    Kawai, J
    Shinagawa, A
    Shibata, K
    Yoshino, M
    Itoh, M
    Ishii, Y
    Arakawa, T
    Hara, A
    Fukunishi, Y
    Konno, H
    Adachi, J
    Fukuda, S
    Aizawa, K
    Izawa, M
    Nishi, K
    Kiyosawa, H
    Kondo, S
    Yamanaka, I
    Saito, T
    Okazaki, Y
    Gojobori, T
    Bono, H
    Kasukawa, T
    Saito, R
    Kadota, K
    Matsuda, H
    Ashburner, M
    Batalov, S
    Casavant, T
    Fleischmann, W
    Gaasterland, T
    Gissi, C
    King, B
    Kochiwa, H
    Kuehl, P
    Lewis, S
    Matsuo, Y
    Nikaido, I
    Pesole, G
    Quackenbush, J
    Schriml, LM
    Staubli, F
    Suzuki, R
    Tomita, M
    Wagner, L
    Washio, T
    Sakai, K
    Okido, T
    Furuno, M
    Aono, H
    NATURE, 2001, 409 (6821) : 685 - 690
  • [8] Full-length transcriptome annotation of a pyrosome, Pyrosoma atlanticum (Chordata, Thaliacea)
    Xiang, Peng
    Bai, Xuanxuan
    Xing, Bingpeng
    Li, Jiangtao
    Zhang, Chao
    Li, Mingyu
    SCIENTIFIC DATA, 2024, 11 (01)
  • [9] Full-length transcriptome analysis of Misgurnus anguillicaudatus
    Luo, Wei
    Wu, Qing
    Wang, Tianzhu
    Xu, Zhou
    Wang, Dongjie
    Wang, Yan
    Yang, Song
    Long, Yuejin
    Du, Zongjun
    MARINE GENOMICS, 2020, 54
  • [10] Functional Annotation of a Full-Length Transcriptome and Identification of Genes Associated with Flower Development in Rhododendron simsii (Ericaceae)
    Liu, Qunlu
    Liaquat, Fiza
    He, Yefeng
    Munis, Muhammad Farooq Hussain
    Zhang, Chunying
    PLANTS-BASEL, 2021, 10 (04):