Generative pretraining from large-scale transcriptomes for single-cell deciphering

被引:16
|
作者
Shen, Hongru [1 ]
Liu, Jilei [1 ]
Hu, Jiani [1 ]
Shen, Xilin [1 ]
Zhang, Chao [2 ]
Wu, Dan [1 ]
Feng, Mengyao [1 ]
Yang, Meng [1 ]
Li, Yang [1 ]
Yang, Yichen [1 ]
Wang, Wei [3 ]
Zhang, Qiang [4 ]
Yang, Jilong [2 ]
Chen, Kexin [3 ]
Li, Xiangchun [1 ]
机构
[1] Tianjin Med Univ, Tianjin Med Univ Canc Inst & Hosp, Tianjin Canc Inst, Tianjins Clin Res Ctr Canc,Natl Clin Res Ctr Canc, Tianjin, Peoples R China
[2] Tianjin Med Univ, Tianjin Med Univ Canc Inst & Hosp, Dept Bone & Soft Tissue Tumor, Tianjins Clin Res Ctr Canc,Natl Clin Res Ctr Canc, Tianjin, Peoples R China
[3] Tianjin Med Univ, Tianjin Med Univ Canc Inst & Hosp, Dept Epidemiol & Biostat, Natl Clin Res Ctr Canc,Key Lab Mol Canc Epidemiol, Tianjin, Peoples R China
[4] Tianjin Med Univ, Tianjin Med Univ Canc Inst & Hosp, Tianjins Clin Res Ctr Canc, Dept Maxillofacial & Otorhinolaryngol Oncol,Natl C, Tianjin, Peoples R China
基金
中国国家自然科学基金;
关键词
EXPRESSION; TISSUES;
D O I
10.1016/j.isci.2023.106536
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Exponential accumulation of single-cell transcriptomes poses great challenge for efficient assimilation. Here, we present an approach entitled generative pretrain-ing from transcriptomes (tGPT) for learning feature representation of transcrip-tomes. tGPT is conceptually simple in that it autoregressive models the ranking of a gene in the context of its preceding neighbors. We developed tGPT with 22.3 million single-cell transcriptomes and used four single-cell datasets to eval-utate its performance on single-cell analysis tasks. In addition, we examine its ap-plications on bulk tissues. The single-cell clusters and cell lineage trajectories derived from tGPT are highly aligned with known cell labels and states. The feature patterns of tumor bulk tissues learned by tGPT are associated with a wide range of genomic alteration events, prognosis, and treatment outcome of immunotherapy. tGPT represents a new analytical paradigm for integrating and deciphering massive amounts of transcriptome data and it will facilitate the inter-pretation and clinical translation of single-cell transcriptomes.
引用
收藏
页数:20
相关论文
共 50 条
  • [41] The large-scale production and use of a single-cell oil highly enriched in docosahexaenoic acid
    Kyle, David J.
    ACS Symposium Series, 2001, 788 : 92 - 107
  • [42] Unbiased classification of sensory neuron types by large-scale single-cell RNA sequencing
    Dmitry Usoskin
    Alessandro Furlan
    Saiful Islam
    Hind Abdo
    Peter Lönnerberg
    Daohua Lou
    Jens Hjerling-Leffler
    Jesper Haeggström
    Olga Kharchenko
    Peter V Kharchenko
    Sten Linnarsson
    Patrik Ernfors
    Nature Neuroscience, 2015, 18 : 145 - 153
  • [43] Analyzing Large-Scale Single-Cell RNA-Seq Data Using Coreset
    Usman, Khalid
    Wan, Fangping
    Zhao, Dan
    Peng, Jian
    Zeng, Jianyang
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2024, 21 (06) : 1784 - 1793
  • [44] Benchmarking principal component analysis for large-scale single-cell RNA-sequencing
    Koki Tsuyuzaki
    Hiroyuki Sato
    Kenta Sato
    Itoshi Nikaido
    Genome Biology, 21
  • [45] Multi-pretraining for Large-scale Text Classification
    Kim, Kang-Min
    Hyeon, Bumsu
    Kim, Yeachan
    Park, Jun-Hyung
    Lee, SangKeun
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 2041 - 2050
  • [46] Unbiased classification of sensory neuron types by large-scale single-cell RNA sequencing
    Usoskin, Dmitry
    Furlan, Alessandro
    Islam, Saiful
    Abdo, Hind
    Lonnerberg, Peter
    Lou, Daohua
    Hjerling-Leffler, Jens
    Haeggstrom, Jesper
    Kharchenko, Olga
    Kharchenko, Peter V.
    Linnarsson, Sten
    Ernfors, Patrik
    NATURE NEUROSCIENCE, 2015, 18 (01) : 145 - +
  • [47] Benchmarking principal component analysis for large-scale single-cell RNA-sequencing
    Tsuyuzaki, Koki
    Sato, Hiroyuki
    Sato, Kenta
    Nikaido, Itoshi
    GENOME BIOLOGY, 2020, 21 (01)
  • [48] Comprehensive Analysis of Large-Scale Transcriptomes from Multiple Cancer Types
    Nong, Baoting
    Guo, Mengbiao
    Wang, Weiwen
    Songyang, Zhou
    Xiong, Yuanyan
    GENES, 2021, 12 (12)
  • [49] Single-Cell Sequencing of Brain Cell Transcriptomes and Epigenomes
    Armand, Ethan J.
    Li, Junhao
    Xie, Fangming
    Luo, Chongyuan
    Mukamel, Eran A.
    NEURON, 2021, 109 (01) : 11 - 26
  • [50] Single-cell transcriptomes in facioscapulohumeral muscular dystrophy
    Hayward, L.
    Guo, D.
    Wagner, K.
    King, O.
    Emerson, C., Jr.
    NEUROMUSCULAR DISORDERS, 2020, 30 : S113 - S114