Toward universal cell embeddings: integrating single-cell RNA-seq datasets across species with SATURN

被引:3
|
作者
Rosen, Yanay [1 ]
Brbic, Maria [2 ]
Roohani, Yusuf [3 ]
Swanson, Kyle [1 ]
Li, Ziang [4 ]
Leskovec, Jure [1 ]
机构
[1] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA
[2] Swiss Fed Inst Technol EPFL, Sch Comp & Commun Sci, Lausanne, Switzerland
[3] Stanford Univ, Dept Biomed Data Sci, Stanford, CA USA
[4] Tsinghua Univ, Dept Comp Sci & Technol, Beijing, Peoples R China
基金
美国国家卫生研究院;
关键词
AQUEOUS-HUMOR; LANGUAGE; GLAUCOMA;
D O I
10.1038/s41592-024-02191-z
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Analysis of single-cell datasets generated from diverse organisms offers unprecedented opportunities to unravel fundamental evolutionary processes of conservation and diversification of cell types. However, interspecies genomic differences limit the joint analysis of cross-species datasets to homologous genes. Here we present SATURN, a deep learning method for learning universal cell embeddings that encodes genes' biological properties using protein language models. By coupling protein embeddings from language models with RNA expression, SATURN integrates datasets profiled from different species regardless of their genomic similarity. SATURN can detect functionally related genes coexpressed across species, redefining differential expression for cross-species analysis. Applying SATURN to three species whole-organism atlases and frog and zebrafish embryogenesis datasets, we show that SATURN can effectively transfer annotations across species, even when they are evolutionarily remote. We also demonstrate that SATURN can be used to find potentially divergent gene functions between glaucoma-associated genes in humans and four other species. SATURN performs cross-species integration and analysis using both single-cell gene expression and protein representations generated by protein language models.
引用
收藏
页码:1492 / 1500
页数:29
相关论文
共 50 条
  • [1] An interpretable framework for clustering single-cell RNA-Seq datasets
    Jesse M. Zhang
    Jue Fan
    H. Christina Fan
    David Rosenfeld
    David N. Tse
    BMC Bioinformatics, 19
  • [2] Single-cell RNA-seq clustering: datasets, models, and algorithms
    Peng, Lihong
    Tian, Xiongfei
    Tian, Geng
    Xu, Junlin
    Huang, Xin
    Weng, Yanbin
    Yang, Jialiang
    Zhou, Liqian
    RNA BIOLOGY, 2020, 17 (06) : 765 - 783
  • [3] Processing single-cell RNA-seq datasets using SingCellaR
    Wang, Guanlin
    Wen, Wei Xiong
    Mead, Adam J.
    Roy, Anindita
    Psaila, Bethan
    Thongjuea, Supat
    STAR PROTOCOLS, 2022, 3 (02):
  • [4] An interpretable framework for clustering single-cell RNA-Seq datasets
    Zhang, Jesse M.
    Fan, Jue
    Fan, Christina
    Rosenfeld, David
    Tse, David N.
    BMC BIOINFORMATICS, 2018, 19
  • [5] Improving Single-Cell RNA-seq Clustering by Integrating Pathways
    Zhang, Chenxing
    Gao, Lin
    Wang, Bingbo
    Gao, Yong
    BRIEFINGS IN BIOINFORMATICS, 2021, 22 (06)
  • [6] Integrating single-cell RNA-seq and imaging with SCOPE-seq2
    Liu, Zhouzerui
    Yuan, Jinzhou
    Lasorella, Anna
    Iavarone, Antonio
    Bruce, Jeffrey N.
    Canoll, Peter
    Sims, Peter A.
    SCIENTIFIC REPORTS, 2020, 10 (01)
  • [7] Integration of Single-Cell RNA-Seq Datasets: A Review of Computational Methods
    Ryu, Yeonjae
    Han, Geun Hee
    Jung, Eunsoo
    Hwang, Daehee
    MOLECULES AND CELLS, 2023, 46 (02) : 106 - 119
  • [8] Integrating single-cell RNA-seq and imaging with SCOPE-seq2
    Zhouzerui Liu
    Jinzhou Yuan
    Anna Lasorella
    Antonio Iavarone
    Jeffrey N. Bruce
    Peter Canoll
    Peter A. Sims
    Scientific Reports, 10
  • [9] Consequences and opportunities arising due to sparser single-cell RNA-seq datasets
    Gerard A. Bouland
    Ahmed Mahfouz
    Marcel J. T. Reinders
    Genome Biology, 24
  • [10] Evaluation of single-cell RNA-seq clustering algorithms on cancer tumor datasets
    Mahalanabis, Alaina
    Turinsky, Andrei L.
    Husic, Mia
    Christensen, Erik
    Luo, Ping
    Naidas, Alaine
    Brudno, Michael
    Pugh, Trevor
    Ramani, Arun K.
    Shooshtari, Parisa
    COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2022, 20 : 6375 - 6387