K-nearest-neighbors induced topological PCA for single cell RNA-sequence data analysis

被引:3
|
作者
Cottrell S. [1 ]
Hozumi Y. [1 ]
Wei G.-W. [1 ,2 ,3 ]
机构
[1] Department of Mathematics, Michigan State University, East Lansing, 48824, MI
[2] Department of Electrical and Computer Engineering, Michigan State University, East Lansing, 48824, MI
[3] Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, 48824, MI
基金
美国国家航空航天局; 美国国家科学基金会; 美国国家卫生研究院;
关键词
Clustering; Dimensionality reduction; Machine learning; Persistent homology; Persistent Laplacian; scRNA-seq; Topology;
D O I
10.1016/j.compbiomed.2024.108497
中图分类号
学科分类号
摘要
Single-cell RNA sequencing (scRNA-seq) is widely used to reveal heterogeneity in cells, which has given us insights into cell–cell communication, cell differentiation, and differential gene expression. However, analyzing scRNA-seq data is a challenge due to sparsity and the large number of genes involved. Therefore, dimensionality reduction and feature selection are important for removing spurious signals and enhancing downstream analysis. Traditional PCA, a main workhorse in dimensionality reduction, lacks the ability to capture geometrical structure information embedded in the data, and previous graph Laplacian regularizations are limited by the analysis of only a single scale. We propose a topological Principal Components Analysis (tPCA) method by the combination of persistent Laplacian (PL) technique and L2,1 norm regularization to address multiscale and multiclass heterogeneity issues in data. We further introduce a k-Nearest-Neighbor (kNN) persistent Laplacian technique to improve the robustness of our persistent Laplacian method. The proposed kNN-PL is a new algebraic topology technique which addresses the many limitations of the traditional persistent homology. Rather than inducing filtration via the varying of a distance threshold, we introduced kNN-tPCA, where filtrations are achieved by varying the number of neighbors in a kNN network at each step, and find that this framework has significant implications for hyper-parameter tuning. We validate the efficacy of our proposed tPCA and kNN-tPCA methods on 11 diverse benchmark scRNA-seq datasets, and showcase that our methods outperform other unsupervised PCA enhancements from the literature, as well as popular Uniform Manifold Approximation (UMAP), t-Distributed Stochastic Neighbor Embedding (tSNE), and Projection Non-Negative Matrix Factorization (NMF) by significant margins. For example, tPCA provides up to 628%, 78%, and 149% improvements to UMAP, tSNE, and NMF, respectively on classification in the F1 metric, and kNN-tPCA offers 53%, 63%, and 32% improvements to UMAP, tSNE, and NMF, respectively on clustering in the ARI metric. © 2024 Elsevier Ltd
引用
收藏
相关论文
共 50 条
  • [1] Improved downstream functional analysis of single-cell RNA-sequence data using DGAN
    Diksha Pandey
    Perumal P. Onkara
    Scientific Reports, 13
  • [2] Improved downstream functional analysis of single-cell RNA-sequence data using DGAN
    Pandey, Diksha
    Onkara, Perumal P. P.
    SCIENTIFIC REPORTS, 2023, 13 (01)
  • [3] DoubletFinder: Doublet Detection in Single-Cell RNA Sequencing Data Using Artificial Nearest Neighbors
    McGinnis, Christopher S.
    Murrow, Lyndsay M.
    Gartner, Zev J.
    CELL SYSTEMS, 2019, 8 (04) : 329 - +
  • [4] Single-cell RNA-sequence analysis of mouse glomerular mesangial cells uncovers mesangial cell essential genes
    Lu, Yuqiu
    Ye, Yuting
    Yang, Qianqian
    Shi, Shaolin
    KIDNEY INTERNATIONAL, 2017, 92 (02) : 504 - 513
  • [5] Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors
    Laleh Haghverdi
    Aaron T L Lun
    Michael D Morgan
    John C Marioni
    Nature Biotechnology, 2018, 36 : 421 - 427
  • [6] Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors
    Haghverdi, Laleh
    Lun, Aaron T. L.
    Morgan, Michael D.
    Marioni, John C.
    NATURE BIOTECHNOLOGY, 2018, 36 (05) : 421 - +
  • [7] Data Analysis and Anomaly Detection in a Wind Farm with k-Nearest Neighbors
    Weiss, Bassel
    Esteban, Segundo
    Santos, Matilde
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2024, PT II, 2025, 15347 : 223 - 235
  • [8] Analysis of stable isotope data: A K nearest-neighbors randomization test
    Rosing, MN
    Ben-David, M
    Barry, RP
    JOURNAL OF WILDLIFE MANAGEMENT, 1998, 62 (01): : 380 - 388
  • [9] Triku: a feature selection method based on nearest neighbors for single-cell data
    Ascension, Alex M.
    Ibanez-Sole, Olga
    Inza, Inaki
    Izeta, Ander
    Arauzo-Bravo, Marcos J.
    GIGASCIENCE, 2022, 11
  • [10] Triku: a feature selection method based on nearest neighbors for single-cell data
    Ascension, Alex M.
    Ibanez-Sole, Olga
    Inza, Inaki
    Izeta, Ander
    Arauzo-Bravo, Marcos J.
    GIGASCIENCE, 2022, 11