Clustering by principal component analysis with Gaussian kernel in high-dimension, low-sample-size settings

被引:13
|
作者
Nakayama, Yugo [1 ]
Yata, Kazuyoshi [2 ]
Aoshima, Makoto [2 ]
机构
[1] Kyoto Univ, Grad Sch Informat, Kyoto, Japan
[2] Univ Tsukuba, Inst Math, Tsukuba, Ibaraki 3058571, Japan
关键词
HDLSS; Non-linear PCA; PC score; Radial basis function kernel; Spherical data; STATISTICAL SIGNIFICANCE; GEOMETRIC REPRESENTATION; DATA CLASSIFICATION; PCA;
D O I
10.1016/j.jmva.2021.104779
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
In this paper, we consider clustering based on the kernel principal component analysis (KPCA) for high-dimension, low-sample-size (HDLSS) data. We give theoretical reasons why the Gaussian kernel is effective for clustering high-dimensional data. In addition, we discuss a choice of the scale parameter yielding a high performance of the KPCA with the Gaussian kernel. Finally, we test the performance of the clustering by using microarray data sets. (C) 2021 The Author(s). Published by Elsevier Inc.
引用
收藏
页数:15
相关论文
共 50 条
  • [31] Asymptotic properties of distance-weighted discrimination and its bias correction for high-dimension, low-sample-size data (vol 4, pg 821, 2021)
    Egashira, Kento
    Yata, Kazuyoshi
    Aoshima, Makoto
    JAPANESE JOURNAL OF STATISTICS AND DATA SCIENCE, 2022, 5 (02) : 717 - 718
  • [32] Comparison of principal component analysis algorithms for imputation in agrometeorological data in high dimension and reduced sample size
    de Souza, Valter Cesar
    Rodrigues, Sergio Augusto
    Almeida Gabriel Filho, Luis Roberto
    PLOS ONE, 2024, 19 (12):
  • [33] Improvement of Classification Performance in High-Dimension Low-Sample-Size Modeling by Sparse Functional Connectivity States in Subjects with Attention Deficit-Hyperactivity Disorder and Healthy Controls
    Zolghadr, Zahra
    Batouli, Seyed Amirhossein
    Tehrani-Doost, Mehdi
    Shafaghi, Lida
    Hadjighassem, Mahmoudreza
    Majd, Hamid Alavi
    Mehrabi, Yadollah
    ARCHIVES OF NEUROSCIENCE, 2023, 10 (02)
  • [34] Intrinsic Dimensionality Estimation of High-Dimension, Low Sample Size Data with D-Asymptotics
    Yata, Kazuyoshi
    Aoshima, Makoto
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2010, 39 (8-9) : 1511 - 1521
  • [35] Principal Component Analysis for High Dimension Stochastic Gaussian Process Model Fitting
    Xuereb, M.
    Huo, T. M.
    Ng, S. H.
    2019 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL ENGINEERING AND ENGINEERING MANAGEMENT (IEEM), 2019, : 632 - 636
  • [36] Effect of dimension reduction by principal component analysis on clustering
    Erisoglu, Murat
    Erisoglu, Ulku
    JOURNAL OF STATISTICS AND MANAGEMENT SYSTEMS, 2011, 14 (02) : 277 - 287
  • [37] Experimental Analysis of Feature Selection Stability for High-Dimension and Low-Sample Size Gene Expression Classification Task
    Dernoncourt, David
    Hanczar, Blaise
    Zucker, Jean-Daniel
    IEEE 12TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS & BIOENGINEERING, 2012, : 350 - 355
  • [38] Sample size for principal component analysis in corn
    Cargnelutti Filho, Alberto
    Toebe, Marcos
    PESQUISA AGROPECUARIA BRASILEIRA, 2021, 56
  • [39] Automatic Gaussian Bandwidth Selection for Kernel Principal Component Analysis
    Shen, Kai
    Wang, Haoyu
    Chaudhuri, Arin
    Asgharzadeh, Zohreh
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT I, KSEM 2023, 2023, 14117 : 15 - 26
  • [40] Gaussian Graphical Model Exploration and Selection in High Dimension Low Sample Size Setting
    Lartigue, Thomas
    Bottani, Simona
    Baron, Stephanie
    Colliot, Olivier
    Durrleman, Stanley
    Allassonniere, Stephanie
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (09) : 3196 - 3213