Clustering by principal component analysis with Gaussian kernel in high-dimension, low-sample-size settings

被引:13
|
作者
Nakayama, Yugo [1 ]
Yata, Kazuyoshi [2 ]
Aoshima, Makoto [2 ]
机构
[1] Kyoto Univ, Grad Sch Informat, Kyoto, Japan
[2] Univ Tsukuba, Inst Math, Tsukuba, Ibaraki 3058571, Japan
关键词
HDLSS; Non-linear PCA; PC score; Radial basis function kernel; Spherical data; STATISTICAL SIGNIFICANCE; GEOMETRIC REPRESENTATION; DATA CLASSIFICATION; PCA;
D O I
10.1016/j.jmva.2021.104779
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
In this paper, we consider clustering based on the kernel principal component analysis (KPCA) for high-dimension, low-sample-size (HDLSS) data. We give theoretical reasons why the Gaussian kernel is effective for clustering high-dimensional data. In addition, we discuss a choice of the scale parameter yielding a high performance of the KPCA with the Gaussian kernel. Finally, we test the performance of the clustering by using microarray data sets. (C) 2021 The Author(s). Published by Elsevier Inc.
引用
收藏
页数:15
相关论文
共 50 条