TAGET: a toolkit for analyzing full-length transcripts from long-read sequencing

被引:5
|
作者
Xia, Yuchao [1 ,2 ]
Jin, Zijie [3 ,4 ]
Zhang, Chengsheng [2 ]
Ouyang, Linkun [5 ]
Dong, Yuhao [2 ]
Li, Juan [6 ]
Guo, Lvze [2 ]
Jing, Biyang [2 ]
Shi, Yang [7 ]
Miao, Susheng [8 ]
Xi, Ruibin [4 ,5 ,9 ]
机构
[1] Beijing Informat Sci & Technol Univ, Coll Sci, Beijing 100192, Peoples R China
[2] Beijing Genex Hlth Technol Co Ltd, Beijing 100195, Peoples R China
[3] Peking Univ, Peking Univ Int Canc Inst, Hlth Sci Ctr, Beijing 100191, Peoples R China
[4] Peking Univ, Sch Math Sci, Beijing 100871, Peoples R China
[5] Peking Univ, Acad Adv Interdisciplinary Studies, Beijing 100871, Peoples R China
[6] Peking Univ, Coll Future Technol, Dept Biomed Engn, Beijing 100871, Peoples R China
[7] BeiGene Beijing Co Ltd, Beijing, Peoples R China
[8] Harbin Med Univ Canc Hosp, Dept Head & Neck Surg, Harbin 150081, Peoples R China
[9] Peking Univ, Ctr Stat Sci, Beijing 100871, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
RNA; CBL; CANCER; ALIGNMENT; ONCOGENE; PROTEIN; HISAT; CELL;
D O I
10.1038/s41467-023-41649-0
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Single-molecule Real-time Isoform Sequencing (Iso-seq) of transcriptomes by PacBio can generate very long and accurate reads, thus providing an ideal platform for full-length transcriptome analysis. We present an integrated computational toolkit named TAGET for Iso-seq full-length transcript data analyses, including transcript alignment, annotation, gene fusion detection, and quantification analyses such as differential expression gene analysis and differential isoform usage analysis. We evaluate the performance of TAGET using a public Iso-seq dataset and newly sequenced Iso-seq datasets from tumor patients. TAGET gives significantly more precise novel splice site prediction and enables more accurate novel isoform and gene fusion discoveries, as validated by experimental validations and comparisons with RNA-seq data. We identify and experimentally validate a differential isoform usage gene ECM1, and further show that its isoform ECM1b may be a tumor-suppressor in laryngocarcinoma. Our results demonstrate that TAGET provides a valuable computational toolkit and can be applied to many full-length transcriptome studies. Accurate long-read RNA sequencing facilitates analysis of full-length transcripts. Here the authors develop an integrative toolkit, optimised for Iso-Seq data analysis, that includes transcript alignment, annotation, quantification and gene fusion detection.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] Short-read and long-read full-length transcriptome of mouse neural stem cells across neurodevelopmental stages
    Chaoqiong Ding
    Xiang Yan
    Mengying Xu
    Ran Zhou
    Yuancun Zhao
    Dan Zhang
    Zongyao Huang
    Zhenzhong Pan
    Peng Xiao
    Huifang Li
    Lu Chen
    Yuan Wang
    Scientific Data, 9
  • [32] Short-read and long-read full-length transcriptome of mouse neural stem cells across neurodevelopmental stages
    Ding, Chaoqiong
    Yan, Xiang
    Xu, Mengying
    Zhou, Ran
    Zhao, Yuancun
    Zhang, Dan
    Huang, Zongyao
    Pan, Zhenzhong
    Xiao, Peng
    Li, Huifang
    Chen, Lu
    Wang, Yuan
    SCIENTIFIC DATA, 2022, 9 (01)
  • [33] Uncovering full-length transcript isoforms of sugarcane cultivar Khon Kaen 3 using single-molecule long-read sequencing
    Piriyapongsa, Jittima
    Kaewprommal, Pavita
    Vaiwsri, Sirintra
    Anuntakarun, Songtham
    Wirojsirasak, Warodom
    Punpee, Prapat
    Klomsa-ard, Peeraya
    Shaw, Philip J.
    Pootakham, Wirulda
    Yoocha, Thippawan
    Sangsrakru, Duangjai
    Tangphatsornruang, Sithichoke
    Tongsima, Sissades
    Tragoonrung, Somvong
    PEERJ, 2018, 6
  • [34] Long-Read Sequencing of Chicken Transcripts and Identification of New Transcript Isoforms
    Thomas, Sean
    Underwood, Jason G.
    Tseng, Elizabeth
    Holloway, Alisha K.
    PLOS ONE, 2014, 9 (04):
  • [35] Predicting full-length transcripts
    Brent, MR
    TRENDS IN BIOTECHNOLOGY, 2002, 20 (07) : 273 - 275
  • [36] The effect of taxonomic classification by full-length 16S rRNA sequencing with a synthetic long-read technology (vol 11, 1727, 2021)
    Jeong, Jinuk
    Yun, Kyeongeui
    Mun, Seyoung
    Chung, Won-Hyong
    Choi, Song-Yi
    Nam, Young-do
    Lim, Mi Young
    Hong, Chang Pyo
    Park, ChanHyeok
    Ahn, Yong Ju
    Han, Kyudong
    SCIENTIFIC REPORTS, 2021, 11 (01)
  • [37] Characterization of Full-Length Transcriptome Sequences and Splice Variants of Lateolabrax maculatus by Single-Molecule Long-Read Sequencing and Their Involvement in Salinity Regulation
    Tian, Yuan
    Wen, Haishen
    Qi, Xin
    Zhang, Xiaoyan
    Liu, Shikai
    Li, Bingyu
    Sun, Yalong
    Li, Jifang
    He, Feng
    Yang, Wenzhao
    Li, Yun
    FRONTIERS IN GENETICS, 2019, 10
  • [38] Readon: a novel algorithm to identify read-through transcripts with long-read sequencing data
    Chen, Siang
    Wang, Hao
    Zhang, Dongdong
    Chen, Runsheng
    Luo, Jianjun
    BIOINFORMATICS, 2024, 40 (06)
  • [39] Highly resolved long read, single molecule sequencing of full-length SIV and SIV env
    Antoine, Alesia
    Fofana, Ismael Ben Farouck
    Deikus, Gintaras
    Sebra, Robert
    Johnson, Welkin
    Smith, Melissa Laird
    JOURNAL OF MEDICAL PRIMATOLOGY, 2018, 47 (05) : 320 - 321
  • [40] PolyAtailor: measuring poly(A) tail length from short-read and long-read sequencing data
    Liu, Mengfei
    Hao, Linlin
    Yang, Sien
    Wu, Xiaohui
    BRIEFINGS IN BIOINFORMATICS, 2022, 23 (04)