TAGET: a toolkit for analyzing full-length transcripts from long-read sequencing

被引:5
|
作者
Xia, Yuchao [1 ,2 ]
Jin, Zijie [3 ,4 ]
Zhang, Chengsheng [2 ]
Ouyang, Linkun [5 ]
Dong, Yuhao [2 ]
Li, Juan [6 ]
Guo, Lvze [2 ]
Jing, Biyang [2 ]
Shi, Yang [7 ]
Miao, Susheng [8 ]
Xi, Ruibin [4 ,5 ,9 ]
机构
[1] Beijing Informat Sci & Technol Univ, Coll Sci, Beijing 100192, Peoples R China
[2] Beijing Genex Hlth Technol Co Ltd, Beijing 100195, Peoples R China
[3] Peking Univ, Peking Univ Int Canc Inst, Hlth Sci Ctr, Beijing 100191, Peoples R China
[4] Peking Univ, Sch Math Sci, Beijing 100871, Peoples R China
[5] Peking Univ, Acad Adv Interdisciplinary Studies, Beijing 100871, Peoples R China
[6] Peking Univ, Coll Future Technol, Dept Biomed Engn, Beijing 100871, Peoples R China
[7] BeiGene Beijing Co Ltd, Beijing, Peoples R China
[8] Harbin Med Univ Canc Hosp, Dept Head & Neck Surg, Harbin 150081, Peoples R China
[9] Peking Univ, Ctr Stat Sci, Beijing 100871, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
RNA; CBL; CANCER; ALIGNMENT; ONCOGENE; PROTEIN; HISAT; CELL;
D O I
10.1038/s41467-023-41649-0
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Single-molecule Real-time Isoform Sequencing (Iso-seq) of transcriptomes by PacBio can generate very long and accurate reads, thus providing an ideal platform for full-length transcriptome analysis. We present an integrated computational toolkit named TAGET for Iso-seq full-length transcript data analyses, including transcript alignment, annotation, gene fusion detection, and quantification analyses such as differential expression gene analysis and differential isoform usage analysis. We evaluate the performance of TAGET using a public Iso-seq dataset and newly sequenced Iso-seq datasets from tumor patients. TAGET gives significantly more precise novel splice site prediction and enables more accurate novel isoform and gene fusion discoveries, as validated by experimental validations and comparisons with RNA-seq data. We identify and experimentally validate a differential isoform usage gene ECM1, and further show that its isoform ECM1b may be a tumor-suppressor in laryngocarcinoma. Our results demonstrate that TAGET provides a valuable computational toolkit and can be applied to many full-length transcriptome studies. Accurate long-read RNA sequencing facilitates analysis of full-length transcripts. Here the authors develop an integrative toolkit, optimised for Iso-Seq data analysis, that includes transcript alignment, annotation, quantification and gene fusion detection.
引用
收藏
页数:12
相关论文
共 50 条
  • [41] SQANTI: extensive characterization of long-read transcript sequences for quality control in full-length transcriptome identification and quantification
    Tardaguila, Manuel
    de la Fuente, Lorena
    Marti, Cristina
    Pereira, Cecile
    Jose Pardo-Palacios, Francisco
    del Risco, Hector
    Ferrell, Marc
    Mellado, Maravillas
    Macchietto, Marissa
    Verheggen, Kenneth
    Edelmann, Mariola
    Ezkurdia, Iakes
    Vazquez, Jesus
    Tress, Michael
    Mortazavi, Ali
    Martens, Lennart
    Rodriguez-Navarro, Susana
    Moreno-Manzano, Victoria
    Conesa, Ana
    GENOME RESEARCH, 2018, 28 (03) : 396 - 411
  • [42] Balancing read length and sequencing depth: Optimizing Nanopore long-read sequencing for monocots with an emphasis on the Liliales
    de la Cerda, Gisel Y.
    Landis, Jacob B.
    Eifler, Evan
    Hernandez, Adriana I.
    Li, Fay-Wei
    Zhang, Jing
    Tribble, Carrie M.
    Karimi, Nisa
    Chan, Patricia
    Givnish, Thomas
    Strickler, Susan R.
    Specht, Chelsea D.
    APPLICATIONS IN PLANT SCIENCES, 2023, 11 (03):
  • [43] HIV-PULSE: a long-read sequencing assay for high-throughput near full-length HIV-1 proviral genome characterization
    Lambrechts, Laurens
    Bonine, Noah
    Verstraeten, Rita
    Pardons, Marion
    Noppe, Ytse
    Rutsaert, Sofie
    Van Nieuwerburgh, Filip
    Van Criekinge, Wim
    Cole, Basiel
    Vandekerckhove, Linos
    NUCLEIC ACIDS RESEARCH, 2023, 51 (20) : E102 - E102
  • [44] THE FULL-LENGTH TRANSCRIPTOME BY THE SINGLE- MOLECULE LONG-READ SEQUENCING REVEALS A HEAT- RESISTANT MECHANISM IN CAPER BUSH (CAPPARIS SPINOSA L
    Liu, Z.
    Zhou, K.
    Wang, L.
    Li, S.
    Chen, G.
    Sun, Z.
    Sun, R.
    Qanmber, G.
    APPLIED ECOLOGY AND ENVIRONMENTAL RESEARCH, 2022, 20 (01): : 601 - 617
  • [45] LncADeep performance on full-length transcripts
    Yang, Cheng
    Zhou, Man
    Xie, Haoling
    Zhu, Huaiqiu
    NATURE MACHINE INTELLIGENCE, 2021, 3 (03) : 197 - 198
  • [46] LncADeep performance on full-length transcripts
    Cheng Yang
    Man Zhou
    Haoling Xie
    Huaiqiu Zhu
    Nature Machine Intelligence, 2021, 3 : 197 - 198
  • [47] Genome sequencing using long-read sequencing
    McEwen, Juan Guillermo
    Gomez, Oscar Mauricio
    REVISTA DE LA ACADEMIA COLOMBIANA DE CIENCIAS EXACTAS FISICAS Y NATURALES, 2023, 47 (183): : 439 - 444
  • [48] Hybrid Sequencing of Full-Length cDNA Transcripts of Stems and Leaves in Dendrobium officinale
    He, Liu
    Fu, Shuhua
    Xu, Zhichao
    Yan, Jun
    Xu, Jiang
    Zhou, Hong
    Zhou, Jianguo
    Chen, Xinlian
    Li, Ying
    Au, Kin Fai
    Yao, Hui
    GENES, 2017, 8 (10)
  • [49] Hybrid Sequencing of Full-Length cDNA Transcripts of the Medicinal Plant Scutellaria baicalensis
    Gao, Ting
    Xu, Zhichao
    Song, Xiaojun
    Huang, Kai
    Li, Ying
    Wei, Jianhe
    Zhu, Xunzhi
    Ren, Hongwei
    Sun, Chao
    INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2019, 20 (18)
  • [50] Construction of a draft reference transcripts of onion (Allium cepa) using long-read sequencing
    Seong-Han Sohn
    Yul-Kyun Ahn
    Tae-Ho Lee
    Jong-Eun Lee
    Min-Hee Jeong
    Chae-Hwa Seo
    Romika Chandra
    Young-Seok Kwon
    Cheol-Woo Kim
    Do-Sun Kim
    So-Youn Won
    Jung Sun Kim
    Dongsu Choi
    Plant Biotechnology Reports, 2016, 10 : 383 - 390