K-nearest-neighbors induced topological PCA for single cell RNA-sequence data analysis

被引:3
|
作者
Cottrell S. [1 ]
Hozumi Y. [1 ]
Wei G.-W. [1 ,2 ,3 ]
机构
[1] Department of Mathematics, Michigan State University, East Lansing, 48824, MI
[2] Department of Electrical and Computer Engineering, Michigan State University, East Lansing, 48824, MI
[3] Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, 48824, MI
基金
美国国家航空航天局; 美国国家科学基金会; 美国国家卫生研究院;
关键词
Clustering; Dimensionality reduction; Machine learning; Persistent homology; Persistent Laplacian; scRNA-seq; Topology;
D O I
10.1016/j.compbiomed.2024.108497
中图分类号
学科分类号
摘要
Single-cell RNA sequencing (scRNA-seq) is widely used to reveal heterogeneity in cells, which has given us insights into cell–cell communication, cell differentiation, and differential gene expression. However, analyzing scRNA-seq data is a challenge due to sparsity and the large number of genes involved. Therefore, dimensionality reduction and feature selection are important for removing spurious signals and enhancing downstream analysis. Traditional PCA, a main workhorse in dimensionality reduction, lacks the ability to capture geometrical structure information embedded in the data, and previous graph Laplacian regularizations are limited by the analysis of only a single scale. We propose a topological Principal Components Analysis (tPCA) method by the combination of persistent Laplacian (PL) technique and L2,1 norm regularization to address multiscale and multiclass heterogeneity issues in data. We further introduce a k-Nearest-Neighbor (kNN) persistent Laplacian technique to improve the robustness of our persistent Laplacian method. The proposed kNN-PL is a new algebraic topology technique which addresses the many limitations of the traditional persistent homology. Rather than inducing filtration via the varying of a distance threshold, we introduced kNN-tPCA, where filtrations are achieved by varying the number of neighbors in a kNN network at each step, and find that this framework has significant implications for hyper-parameter tuning. We validate the efficacy of our proposed tPCA and kNN-tPCA methods on 11 diverse benchmark scRNA-seq datasets, and showcase that our methods outperform other unsupervised PCA enhancements from the literature, as well as popular Uniform Manifold Approximation (UMAP), t-Distributed Stochastic Neighbor Embedding (tSNE), and Projection Non-Negative Matrix Factorization (NMF) by significant margins. For example, tPCA provides up to 628%, 78%, and 149% improvements to UMAP, tSNE, and NMF, respectively on classification in the F1 metric, and kNN-tPCA offers 53%, 63%, and 32% improvements to UMAP, tSNE, and NMF, respectively on clustering in the ARI metric. © 2024 Elsevier Ltd
引用
收藏
相关论文
共 50 条
  • [41] Computational Cell Cycle Analysis of Single Cell RNA-Seq Data
    Moussa, Marmar
    Mandoiu, Ion I.
    COMPUTATIONAL ADVANCES IN BIO AND MEDICAL SCIENCES, 2021, 12686 : 71 - 87
  • [42] Computational cell cycle analysis of single cell RNA-seq data
    Moussa, Marmar
    2018 IEEE 8TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL ADVANCES IN BIO AND MEDICAL SCIENCES (ICCABS), 2018,
  • [43] PCA-like Methods for the Integration of Single Cell RNA-seq Data with Metabolic Networks.
    Gogolewski, Krzysztof
    Gambin, Anna
    2018 IEEE 8TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL ADVANCES IN BIO AND MEDICAL SCIENCES (ICCABS), 2018,
  • [44] How to Get Started with Single Cell RNA Sequencing Data Analysis
    Balzer, Michael S.
    Ma, Ziyuan
    Zhou, Jianfu
    Abedini, Amin
    Susztak, Katalin
    JOURNAL OF THE AMERICAN SOCIETY OF NEPHROLOGY, 2021, 32 (06): : 1279 - 1292
  • [45] Analysis of single-cell RNA sequencing data based on autoencoders
    Andrea Tangherloni
    Federico Ricciuti
    Daniela Besozzi
    Pietro Liò
    Ana Cvejic
    BMC Bioinformatics, 22
  • [46] Analysis of single-cell RNA sequencing data based on autoencoders
    Tangherloni, Andrea
    Ricciuti, Federico
    Besozzi, Daniela
    Lio, Pietro
    Cvejic, Ana
    BMC BIOINFORMATICS, 2021, 22 (01)
  • [47] An Introduction to the Analysis of Single-Cell RNA-Sequencing Data
    AlJanahi, Aisha A.
    Danielsen, Mark
    Dunbar, Cynthia E.
    MOLECULAR THERAPY-METHODS & CLINICAL DEVELOPMENT, 2018, 10 : 189 - 196
  • [48] Shared Nearest Neighbors Approach and Interactive Browser for Network Analysis of a Comprehensive Non-Small-Cell Lung Cancer Data Set
    Schmidt, Stephanie T.
    Akhave, Neal
    Knightly, Ryan E.
    Reuben, Alexandre
    Vokes, Natalie
    Zhang, Jianhua
    Li, Jun
    Fujimoto, Junya
    Byers, Lauren A.
    Sanchez-Espiridion, Beatriz
    Diao, Lixia
    Wang, Jing
    Federico, Lorenzo
    Forget, Marie-Andree
    McGrail, Daniel J.
    Weissferdt, Annikka
    Lin, Shiaw-Yih
    Lee, Younghee
    Suzuki, Erika
    Kovacs, Jeffrey J.
    Behrens, Carmen
    Wistuba, Ignacio I.
    Futreal, Andrew
    Vaporciyan, Ara
    Sepesi, Boris
    Heymach, John V.
    Bernatchez, Chantale
    Haymaker, Cara
    Cascone, Tina
    Zhang, Jianjun
    Bristow, Christopher A.
    Heffernan, Timothy P.
    Negrao, Marcelo V.
    Gibbons, Don L.
    JCO CLINICAL CANCER INFORMATICS, 2022, 6 : e2200040
  • [49] New generative methods for single-cell transcriptome data in bulk RNA sequence deconvolution
    Nishikawa, Toui
    Lee, Masatoshi
    Amau, Masataka
    SCIENTIFIC REPORTS, 2024, 14 (01)
  • [50] New generative methods for single-cell transcriptome data in bulk RNA sequence deconvolution
    Toui Nishikawa
    Masatoshi Lee
    Masataka Amau
    Scientific Reports, 14