Reusability report: Learning the transcriptional grammar in single-cell RNA-sequencing data using transformers

被引:7
|
作者
Khan, Sumeer Ahmad [1 ,2 ]
Maillo, Alberto [1 ]
Lagani, Vincenzo [1 ,2 ,3 ]
Lehmann, Robert [1 ]
Kiani, Narsis A. [4 ,5 ]
Gomez-Cabrero, David [1 ,6 ]
Tegner, Jesper [1 ,5 ,7 ,8 ]
机构
[1] King Abdullah Univ Sci & Technol KAUST, Biol & Environm Sci & Engn Div, Thuwal, Saudi Arabia
[2] SDAIA KAUST Ctr Excellence Data Sci & Artificial I, Thuwal, Saudi Arabia
[3] Ilia State Univ, Inst Chem Biol, Tbilisi, Georgia
[4] Karolinska Inst, Dept Oncol & Pathol, Algorithm Dynam Lab, Stockholm, Sweden
[5] Karolinska Univ Hosp, Karolinska Inst, Ctr Mol Med, Dept Med,Unit Comp Med, Stockholm, Sweden
[6] Univ Publ Navarra UPNA, IdiSNA, Navarrabiomed, Translat Bioinformat Unit, Pamplona, Spain
[7] King Abdullah Univ Sci & Technol KAUST, Comp Elect & Math Sci & Engn Div, Thuwal, Saudi Arabia
[8] Sci Life Lab, Solna, Sweden
关键词
Cell types - Data driven - Embeddings - Genomic data - Genomics - Language processing - Machine learning algorithms - Natural languages - Pre-training - Single cells;
D O I
10.1038/s42256-023-00757-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The rise of single-cell genomics is an attractive opportunity for data-hungry machine learning algorithms. The scBERT method, inspired by the success of BERT ('bidirectional encoder representations from transformers') in natural language processing, was recently introduced by Yang et al. as a data-driven tool to annotate cell types in single-cell genomics data. Analogous to contextual embedding in BERT, scBERT leverages pretraining and self-attention mechanisms to learn the 'transcriptional grammar' of cells. Here we investigate the reusability beyond the original datasets, assessing the generalizability of natural language techniques in single-cell genomics. The degree of imbalance in the cell-type distribution substantially influences the performance of scBERT. Anticipating an increased utilization of transformers, we highlight the necessity to consider data distribution carefully and introduce a subsampling technique to mitigate the influence of an imbalanced distribution. Our analysis serves as a stepping stone towards understanding and optimizing the use of transformers in single-cell genomics. scBERT, a pretrained neural network for single-cell sequencing tasks, was published last year in Nature Machine Intelligence. To test the reusability of the method, Khan et al. use the code to assess the generalizablility of transformer architectures on single-cell genomics tasks.
引用
收藏
页码:1437 / 1446
页数:13
相关论文
共 50 条
  • [21] Single-cell RNA-sequencing in asthma research
    Tang, Weifeng
    Li, Mihui
    Teng, Fangzhou
    Cui, Jie
    Dong, Jingcheng
    Wang, Wenqian
    FRONTIERS IN IMMUNOLOGY, 2022, 13
  • [22] Single-cell isolation by a modular single-cell pipette for RNA-sequencing
    Zhang, Kai
    Gao, Min
    Chong, Zechen
    Li, Ying
    Han, Xin
    Chen, Rui
    Qin, Lidong
    LAB ON A CHIP, 2016, 16 (24) : 4742 - 4748
  • [23] A comprehensive human embryo reference tool using single-cell RNA-sequencing data
    Zhao, Cheng
    Reyes, Alvaro Plaza
    Schell, John Paul
    Weltner, Jere
    Ortega, Nicolas M.
    Zheng, Yi
    Bjorklund, Asa K.
    Baque-vidal, Laura
    Sokka, Joonas
    Torokovic, Ras
    Cox, Brian
    Rossant, Janet
    Fu, Jianping
    Petropoulos, Sophie
    Lanner, Fredrik
    NATURE METHODS, 2025, 22 (01) : 193 - 206
  • [24] Cell type matching in single-cell RNA-sequencing data using FR-Match
    Zhang, Yun
    Aevermann, Brian
    Gala, Rohan
    Scheuermann, Richard H.
    SCIENTIFIC REPORTS, 2022, 12 (01):
  • [25] Cell type matching in single-cell RNA-sequencing data using FR-Match
    Yun Zhang
    Brian Aevermann
    Rohan Gala
    Richard H. Scheuermann
    Scientific Reports, 12 (1)
  • [26] Demultiplexing of single-cell RNA-sequencing data using interindividual variation in gene expression
    Nassiri, Isar
    Kwok, Andrew J.
    Bhandari, Aneesha
    Bull, Katherine R.
    Garner, Lucy C.
    Klenerman, Paul
    Webber, Caleb
    Parkkinen, Laura
    Lee, Angela W.
    Wu, Yanxia
    Fairfax, Benjamin
    Knight, Julian C.
    Buck, David
    Piazza, Paolo
    BIOINFORMATICS ADVANCES, 2024, 4 (01):
  • [27] Improved deconvolution of combined bulk and single-cell RNA-sequencing data
    Lei, Haoyun
    Guo, Xiaoyan A.
    Tao, Yifeng
    Ding, Kai
    Fu, Xuecong
    Oesterreich, Steffi
    Lee, Adrian V.
    Schwartz, Russell
    CANCER RESEARCH, 2022, 82 (12)
  • [28] Comparison of Computational Methods for Imputing Single-Cell RNA-Sequencing Data
    Zhang, Lihua
    Zhang, Shihua
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2020, 17 (02) : 376 - 389
  • [29] Consensus Nature Inspired Clustering of Single-Cell RNA-Sequencing Data
    Abou El-Naga, Amany H.
    Sayed, Sabah
    Salah, Akram
    Mohsen, Heba
    IEEE ACCESS, 2022, 10 : 98079 - 98094
  • [30] Missing data and technical variability in single-cell RNA-sequencing experiments
    Hicks, Stephanie C.
    Townes, F. William
    Teng, Mingxiang
    Irizarry, Rafael A.
    BIOSTATISTICS, 2018, 19 (04) : 562 - 578