K-nearest-neighbors induced topological PCA for single cell RNA-sequence data analysis

被引:3
|
作者
Cottrell S. [1 ]
Hozumi Y. [1 ]
Wei G.-W. [1 ,2 ,3 ]
机构
[1] Department of Mathematics, Michigan State University, East Lansing, 48824, MI
[2] Department of Electrical and Computer Engineering, Michigan State University, East Lansing, 48824, MI
[3] Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, 48824, MI
基金
美国国家航空航天局; 美国国家科学基金会; 美国国家卫生研究院;
关键词
Clustering; Dimensionality reduction; Machine learning; Persistent homology; Persistent Laplacian; scRNA-seq; Topology;
D O I
10.1016/j.compbiomed.2024.108497
中图分类号
学科分类号
摘要
Single-cell RNA sequencing (scRNA-seq) is widely used to reveal heterogeneity in cells, which has given us insights into cell–cell communication, cell differentiation, and differential gene expression. However, analyzing scRNA-seq data is a challenge due to sparsity and the large number of genes involved. Therefore, dimensionality reduction and feature selection are important for removing spurious signals and enhancing downstream analysis. Traditional PCA, a main workhorse in dimensionality reduction, lacks the ability to capture geometrical structure information embedded in the data, and previous graph Laplacian regularizations are limited by the analysis of only a single scale. We propose a topological Principal Components Analysis (tPCA) method by the combination of persistent Laplacian (PL) technique and L2,1 norm regularization to address multiscale and multiclass heterogeneity issues in data. We further introduce a k-Nearest-Neighbor (kNN) persistent Laplacian technique to improve the robustness of our persistent Laplacian method. The proposed kNN-PL is a new algebraic topology technique which addresses the many limitations of the traditional persistent homology. Rather than inducing filtration via the varying of a distance threshold, we introduced kNN-tPCA, where filtrations are achieved by varying the number of neighbors in a kNN network at each step, and find that this framework has significant implications for hyper-parameter tuning. We validate the efficacy of our proposed tPCA and kNN-tPCA methods on 11 diverse benchmark scRNA-seq datasets, and showcase that our methods outperform other unsupervised PCA enhancements from the literature, as well as popular Uniform Manifold Approximation (UMAP), t-Distributed Stochastic Neighbor Embedding (tSNE), and Projection Non-Negative Matrix Factorization (NMF) by significant margins. For example, tPCA provides up to 628%, 78%, and 149% improvements to UMAP, tSNE, and NMF, respectively on classification in the F1 metric, and kNN-tPCA offers 53%, 63%, and 32% improvements to UMAP, tSNE, and NMF, respectively on clustering in the ARI metric. © 2024 Elsevier Ltd
引用
收藏
相关论文
共 50 条
  • [31] Artificial intelligence-based non-small cell lung cancer transcriptome RNA-sequence analysis technology selection guide
    Joo, Min Soo
    Pyo, Kyoung-Ho
    Chung, Jong-Moon
    Cho, Byoung Chul
    FRONTIERS IN BIOENGINEERING AND BIOTECHNOLOGY, 2023, 11
  • [32] scDA: Single cell discriminant analysis for single-cell RNA sequencing data
    Shi, Qianqian
    Li, Xinxing
    Peng, Qirui
    Zhang, Chuanchao
    Chen, Luonan
    Computational and Structural Biotechnology Journal, 2021, 19 : 3234 - 3244
  • [33] scDA: Single cell discriminant analysis for single-cell RNA sequencing data
    Shi, Qianqian
    Li, Xinxing
    Peng, Qirui
    Zhang, Chuanchao
    Chen, Luonan
    COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2021, 19 : 3234 - 3244
  • [34] Unsupervised learning from topological data analysis to identify cellular states from single-cell RNA-seq analysis
    Petenkaya, Aydolun
    Hu, Chuansheng
    Chronis, Constantinos
    Shao, Zhifeng
    Liang, Jie
    BIOPHYSICAL JOURNAL, 2023, 122 (03) : 360A - 360A
  • [35] Analysis of Collected Data and Establishment of an Abnormal Data Detection Algorithm Using Principal Component Analysis and K-Nearest Neighbors for Predictive Maintenance of Ship Propulsion Engine
    Park, Jinkyu
    Oh, Jungmo
    PROCESSES, 2022, 10 (11)
  • [36] Complex Analysis of Single-Cell RNA Sequencing Data
    Khozyainova, Anna A. A.
    Valyaeva, Anna A. A.
    Arbatsky, Mikhail S. S.
    Isaev, Sergey V. V.
    Iamshchikov, Pavel S. S.
    Volchkov, Egor V. V.
    Sabirov, Marat S. S.
    Zainullina, Viktoria R. R.
    Chechekhin, Vadim I. I.
    Vorobev, Rostislav S. S.
    Menyailo, Maxim E. E.
    Tyurin-Kuzmin, Pyotr A. A.
    Denisov, Evgeny V. V.
    BIOCHEMISTRY-MOSCOW, 2023, 88 (02) : 231 - 252
  • [37] Complex Analysis of Single-Cell RNA Sequencing Data
    Anna A. Khozyainova
    Anna A. Valyaeva
    Mikhail S. Arbatsky
    Sergey V. Isaev
    Pavel S. Iamshchikov
    Egor V. Volchkov
    Marat S. Sabirov
    Viktoria R. Zainullina
    Vadim I. Chechekhin
    Rostislav S. Vorobev
    Maxim E. Menyailo
    Pyotr A. Tyurin-Kuzmin
    Evgeny V. Denisov
    Biochemistry (Moscow), 2023, 88 : 231 - 252
  • [38] Differential abundance testing on single-cell data using k-nearest neighbor graphs
    Dann, Emma
    Henderson, Neil C.
    Teichmann, Sarah A.
    Morgan, Michael D.
    Marioni, John C.
    NATURE BIOTECHNOLOGY, 2022, 40 (02) : 245 - +
  • [39] Differential abundance testing on single-cell data using k-nearest neighbor graphs
    Emma Dann
    Neil C. Henderson
    Sarah A. Teichmann
    Michael D. Morgan
    John C. Marioni
    Nature Biotechnology, 2022, 40 : 245 - 253
  • [40] Single cell RNA-sequence analysis to identify transcriptomic differences associated with treatment outcome and ethnicity in circulating tumor cells (CTCs) from patients (pts) with metastatic colorectal cancer (mCRC).
    Battaglin, Francesca
    Amzaleg, Yonatan
    Kozuka, Masahiro
    Jayachandran, Priya
    Matsusaka, Satoshi
    Hirai, Mitsuharu
    Wang, Jingyuan
    Arai, Hiroyuki
    Kawanishi, Natsuko
    Soni, Shivani
    Zhang, Wu
    Yu, Min
    Lenz, Heinz-Josef
    JOURNAL OF CLINICAL ONCOLOGY, 2021, 39 (15)