TAGET: a toolkit for analyzing full-length transcripts from long-read sequencing

被引:5
|
作者
Xia, Yuchao [1 ,2 ]
Jin, Zijie [3 ,4 ]
Zhang, Chengsheng [2 ]
Ouyang, Linkun [5 ]
Dong, Yuhao [2 ]
Li, Juan [6 ]
Guo, Lvze [2 ]
Jing, Biyang [2 ]
Shi, Yang [7 ]
Miao, Susheng [8 ]
Xi, Ruibin [4 ,5 ,9 ]
机构
[1] Beijing Informat Sci & Technol Univ, Coll Sci, Beijing 100192, Peoples R China
[2] Beijing Genex Hlth Technol Co Ltd, Beijing 100195, Peoples R China
[3] Peking Univ, Peking Univ Int Canc Inst, Hlth Sci Ctr, Beijing 100191, Peoples R China
[4] Peking Univ, Sch Math Sci, Beijing 100871, Peoples R China
[5] Peking Univ, Acad Adv Interdisciplinary Studies, Beijing 100871, Peoples R China
[6] Peking Univ, Coll Future Technol, Dept Biomed Engn, Beijing 100871, Peoples R China
[7] BeiGene Beijing Co Ltd, Beijing, Peoples R China
[8] Harbin Med Univ Canc Hosp, Dept Head & Neck Surg, Harbin 150081, Peoples R China
[9] Peking Univ, Ctr Stat Sci, Beijing 100871, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
RNA; CBL; CANCER; ALIGNMENT; ONCOGENE; PROTEIN; HISAT; CELL;
D O I
10.1038/s41467-023-41649-0
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Single-molecule Real-time Isoform Sequencing (Iso-seq) of transcriptomes by PacBio can generate very long and accurate reads, thus providing an ideal platform for full-length transcriptome analysis. We present an integrated computational toolkit named TAGET for Iso-seq full-length transcript data analyses, including transcript alignment, annotation, gene fusion detection, and quantification analyses such as differential expression gene analysis and differential isoform usage analysis. We evaluate the performance of TAGET using a public Iso-seq dataset and newly sequenced Iso-seq datasets from tumor patients. TAGET gives significantly more precise novel splice site prediction and enables more accurate novel isoform and gene fusion discoveries, as validated by experimental validations and comparisons with RNA-seq data. We identify and experimentally validate a differential isoform usage gene ECM1, and further show that its isoform ECM1b may be a tumor-suppressor in laryngocarcinoma. Our results demonstrate that TAGET provides a valuable computational toolkit and can be applied to many full-length transcriptome studies. Accurate long-read RNA sequencing facilitates analysis of full-length transcripts. Here the authors develop an integrative toolkit, optimised for Iso-Seq data analysis, that includes transcript alignment, annotation, quantification and gene fusion detection.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] PacBio single-molecule long-read sequencing provides new insights into the complexity of full-length transcripts in oriental river prawn, macrobrachium nipponense
    Cheng-Yan Mou
    Qiang Li
    Zhi-Peng Huang
    Hong-Yu Ke
    Han Zhao
    Zhong-Meng Zhao
    Yuan-Liang Duan
    Hua-Dong Li
    Yu Xiao
    Zhou-Ming Qian
    Jun Du
    Jian Zhou
    Lu Zhang
    BMC Genomics, 24
  • [22] COMPREHENSIVE CHARACTERIZATION OF FULL-LENGTH ISOFORMS IN SPLICING FACTOR MUTATED ACUTE MYELOID LEUKEMIA WITH LONG-READ SEQUENCING
    Thieme, S.
    Graf, A.
    Krebs, S.
    Blum, H.
    Christ, J.
    Rothenberg-Thurley, M.
    Schneider, S.
    Sauerland, M. C.
    Goerlich, D.
    Krug, U.
    Berdel, W. E.
    Woermann, B. J.
    Hiddemann, W.
    Braess, J.
    Spiekermann, K.
    Metzeler, K. H.
    Masmann, U.
    Herold, T.
    ANNALS OF HEMATOLOGY, 2023, 102 (SUPPL 1) : S30 - S30
  • [23] The effect of taxonomic classification by full-length 16S rRNA sequencing with a synthetic long-read technology
    Jinuk Jeong
    Kyeongeui Yun
    Seyoung Mun
    Won-Hyong Chung
    Song-Yi Choi
    Young-do Nam
    Mi Young Lim
    Chang Pyo Hong
    ChanHyeok Park
    Yong Ju Ahn
    Kyudong Han
    Scientific Reports, 11
  • [24] The effect of taxonomic classification by full-length 16S rRNA sequencing with a synthetic long-read technology
    Jeong, Jinuk
    Yun, Kyeongeui
    Mun, Seyoung
    Chung, Won-Hyong
    Choi, Song-Yi
    Nam, Young-do
    Lim, Mi Young
    Hong, Chang Pyo
    Park, ChanHyeok
    Ahn, Yong
    Han, Kyudong
    SCIENTIFIC REPORTS, 2021, 11 (01)
  • [25] Long-read RNA sequencing enables full-length chimeric transcript annotation of transposable elements in lung adenocarcinoma
    Li, Yang
    Liu, Yahui
    Xie, Yingxin
    Wang, Yaxuan
    Wang, Jing
    Wang, Huan
    Xia, Lin
    Xie, Dan
    BMC CANCER, 2025, 25 (01)
  • [26] cDNA Library Enrichment of Full Length Transcripts for SMRT Long Read Sequencing
    Cartolano, Maria
    Huettel, Bruno
    Hartwig, Benjamin
    Reinhardt, Richard
    Schneeberger, Korbinian
    PLOS ONE, 2016, 11 (06):
  • [27] A full-length transcriptome of Sepia esculenta using a combination of single-molecule long-read (SMRT) and Illumina sequencing
    Zhang, Jinyong
    Liu, Changlin
    He, Muchun
    Xiang, Zilong
    Yin, Yanan
    Liu, Shufang
    Zhuang, ZhiMeng
    MARINE GENOMICS, 2019, 43 : 54 - 57
  • [28] Long-read subcellular fractionation and sequencing reveals the translational fate of full-length mRNA isoforms during neuronal differentiation
    Ritter, Alexander J.
    Draper, Jolene M.
    Vollmers, Christopher
    Sanford, Jeremy R.
    GENOME RESEARCH, 2024, 34 (11) : 2000 - 2011
  • [29] Publisher Correction: The effect of taxonomic classification by full-length 16S rRNA sequencing with a synthetic long-read technology
    Jinuk Jeong
    Kyeongeui Yun
    Seyoung Mun
    Won‑Hyong Chung
    Song‑Yi Choi
    Young‑do Nam
    Mi Young Lim
    Chang Pyo Hong
    ChanHyeok Park
    Yong Ju Ahn
    Kyudong Han
    Scientific Reports, 11
  • [30] Time-Course Transcriptome Profiling of a Poxvirus Using Long-Read Full-Length Assay
    Tombacz, Dora
    Prazsak, Istvan
    Torma, Gabor
    Csabai, Zsolt
    Balazs, Zsolt
    Moldovan, Norbert
    Denes, Bela
    Snyder, Michael
    Boldogkoi, Zsolt
    PATHOGENS, 2021, 10 (08):