Reusability report: Learning the transcriptional grammar in single-cell RNA-sequencing data using transformers

被引:7
|
作者
Khan, Sumeer Ahmad [1 ,2 ]
Maillo, Alberto [1 ]
Lagani, Vincenzo [1 ,2 ,3 ]
Lehmann, Robert [1 ]
Kiani, Narsis A. [4 ,5 ]
Gomez-Cabrero, David [1 ,6 ]
Tegner, Jesper [1 ,5 ,7 ,8 ]
机构
[1] King Abdullah Univ Sci & Technol KAUST, Biol & Environm Sci & Engn Div, Thuwal, Saudi Arabia
[2] SDAIA KAUST Ctr Excellence Data Sci & Artificial I, Thuwal, Saudi Arabia
[3] Ilia State Univ, Inst Chem Biol, Tbilisi, Georgia
[4] Karolinska Inst, Dept Oncol & Pathol, Algorithm Dynam Lab, Stockholm, Sweden
[5] Karolinska Univ Hosp, Karolinska Inst, Ctr Mol Med, Dept Med,Unit Comp Med, Stockholm, Sweden
[6] Univ Publ Navarra UPNA, IdiSNA, Navarrabiomed, Translat Bioinformat Unit, Pamplona, Spain
[7] King Abdullah Univ Sci & Technol KAUST, Comp Elect & Math Sci & Engn Div, Thuwal, Saudi Arabia
[8] Sci Life Lab, Solna, Sweden
关键词
Cell types - Data driven - Embeddings - Genomic data - Genomics - Language processing - Machine learning algorithms - Natural languages - Pre-training - Single cells;
D O I
10.1038/s42256-023-00757-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The rise of single-cell genomics is an attractive opportunity for data-hungry machine learning algorithms. The scBERT method, inspired by the success of BERT ('bidirectional encoder representations from transformers') in natural language processing, was recently introduced by Yang et al. as a data-driven tool to annotate cell types in single-cell genomics data. Analogous to contextual embedding in BERT, scBERT leverages pretraining and self-attention mechanisms to learn the 'transcriptional grammar' of cells. Here we investigate the reusability beyond the original datasets, assessing the generalizability of natural language techniques in single-cell genomics. The degree of imbalance in the cell-type distribution substantially influences the performance of scBERT. Anticipating an increased utilization of transformers, we highlight the necessity to consider data distribution carefully and introduce a subsampling technique to mitigate the influence of an imbalanced distribution. Our analysis serves as a stepping stone towards understanding and optimizing the use of transformers in single-cell genomics. scBERT, a pretrained neural network for single-cell sequencing tasks, was published last year in Nature Machine Intelligence. To test the reusability of the method, Khan et al. use the code to assess the generalizablility of transformer architectures on single-cell genomics tasks.
引用
收藏
页码:1437 / 1446
页数:13
相关论文
共 50 条
  • [1] Reusability report: Learning the transcriptional grammar in single-cell RNA-sequencing data using transformers
    Sumeer Ahmad Khan
    Alberto Maillo
    Vincenzo Lagani
    Robert Lehmann
    Narsis A. Kiani
    David Gomez-Cabrero
    Jesper Tegner
    Nature Machine Intelligence, 2023, 5 : 1437 - 1446
  • [2] Defining mammary basal cell transcriptional states using single-cell RNA-sequencing
    Gutierrez, Guadalupe
    Sun, Peng
    Han, Yingying
    Dai, Xing
    SCIENTIFIC REPORTS, 2022, 12 (01)
  • [3] Defining mammary basal cell transcriptional states using single-cell RNA-sequencing
    Guadalupe Gutierrez
    Peng Sun
    Yingying Han
    Xing Dai
    Scientific Reports, 12
  • [4] An Introduction to the Analysis of Single-Cell RNA-Sequencing Data
    AlJanahi, Aisha A.
    Danielsen, Mark
    Dunbar, Cynthia E.
    MOLECULAR THERAPY-METHODS & CLINICAL DEVELOPMENT, 2018, 10 : 189 - 196
  • [5] Machine learning and statistical methods for clustering single-cell RNA-sequencing data
    Petegrosso, Raphael
    Li, Zhuliu
    Kuang, Rui
    BRIEFINGS IN BIOINFORMATICS, 2020, 21 (04) : 1209 - 1223
  • [6] Joint learning dimension reduction and clustering of single-cell RNA-sequencing data
    Wu, Wenming
    Ma, Xiaoke
    BIOINFORMATICS, 2020, 36 (12) : 3825 - 3832
  • [7] Single-Cell RNA-Sequencing in Glioma
    Eli Johnson
    Katherine L. Dickerson
    Ian D. Connolly
    Melanie Hayden Gephart
    Current Oncology Reports, 2018, 20
  • [8] Transcriptomics and single-cell RNA-sequencing
    Chambers, Daniel C.
    Carew, Alan M.
    Lukowski, Samuel W.
    Powell, Joseph E.
    RESPIROLOGY, 2019, 24 (01) : 29 - 36
  • [9] Single-Cell RNA-Sequencing in Glioma
    Johnson, Eli
    Dickerson, Katherine L.
    Connolly, Ian D.
    Gephart, Melanie Hayden
    CURRENT ONCOLOGY REPORTS, 2018, 20 (05)
  • [10] Single-cell RNA-sequencing of the brain
    Duran, Raquel Cuevas-Diaz
    Wei, Haichao
    Wu, Jia Qian
    CLINICAL AND TRANSLATIONAL MEDICINE, 2017, 6