Subspace clustering of high-dimensional data: a predictive approach

被引:56
|
作者
McWilliams, Brian [2 ]
Montana, Giovanni [1 ]
机构
[1] Univ London Imperial Coll Sci Technol & Med, Dept Math, London, England
[2] ETH, Dept Informat, Zurich, Switzerland
基金
英国工程与自然科学研究理事会;
关键词
Subspace clustering; PCA; PRESS statistics; Variable selection; Model selection; Microarrays; CROSS-VALIDATION; CLASS DISCOVERY; CLASSIFICATION; MIXTURES; CANCER;
D O I
10.1007/s10618-013-0317-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In several application domains, high-dimensional observations are collected and then analysed in search for naturally occurring data clusters which might provide further insights about the nature of the problem. In this paper we describe a new approach for partitioning such high-dimensional data. Our assumption is that, within each cluster, the data can be approximated well by a linear subspace estimated by means of a principal component analysis (PCA). The proposed algorithm, Predictive Subspace Clustering (PSC) partitions the data into clusters while simultaneously estimating cluster-wise PCA parameters. The algorithm minimises an objective function that depends upon a new measure of influence for PCA models. A penalised version of the algorithm is also described for carrying our simultaneous subspace clustering and variable selection. The convergence of PSC is discussed in detail, and extensive simulation results and comparisons to competing methods are presented. The comparative performance of PSC has been assessed on six real gene expression data sets for which PSC often provides state-of-art results.
引用
收藏
页码:736 / 772
页数:37
相关论文
共 50 条
  • [21] Subspace Clustering in High-Dimensional Data Streams: A Systematic Literature Review
    Ghani, Nur Laila Ab
    Aziz, Izzatdin Abdul
    AbdulKadir, Said Jadid
    [J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 75 (02): : 4649 - 4668
  • [22] Local-Density Subspace Distributed Clustering for High-Dimensional Data
    Geng, Yangli-ao
    Li, Qingyong
    Liang, Mingfei
    Chi, Chong-Yung
    Tan, Juan
    Huang, Heng
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2020, 31 (08) : 1799 - 1814
  • [23] Synchronization-based scalable subspace clustering of high-dimensional data
    Shao, Junming
    Wang, Xinzuo
    Yang, Qinli
    Plant, Claudia
    Boehm, Christian
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2017, 52 (01) : 83 - 111
  • [24] A novel algorithm for fast and scalable subspace clustering of high-dimensional data
    Kaur A.
    Datta A.
    [J]. Journal of Big Data, 2015, 2 (01)
  • [25] Clustering High-Dimensional Data: A Survey on Subspace Clustering, Pattern-Based Clustering, and Correlation Clustering
    Kriegel, Hans-Peter
    Kroeger, Peer
    Zimek, Arthur
    [J]. ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2009, 3 (01)
  • [26] A grid-based subspace clustering algorithm for high-dimensional data streams
    Sun, Yufen
    Lu, Yansheng
    [J]. WEB INFORMATION SYSTEMS - WISE 2006 WORKSHOPS, PROCEEDINGS, 2006, 4256 : 37 - 48
  • [27] Fast Adaptive K-Means Subspace Clustering for High-Dimensional Data
    Wang, Xiao-Dong
    Chen, Rung-Ching
    Yan, Fei
    Zeng, Zhi-Qiang
    Hong, Chao-Qun
    [J]. IEEE ACCESS, 2019, 7 : 42639 - 42651
  • [28] High-dimensional data clustering
    Bouveyron, C.
    Girard, S.
    Schmid, C.
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2007, 52 (01) : 502 - 519
  • [29] ASCRClu: an adaptive subspace combination and reduction algorithm for clustering of high-dimensional data
    Kavan Fatehi
    Mohsen Rezvani
    Mansoor Fateh
    [J]. Pattern Analysis and Applications, 2020, 23 : 1651 - 1663
  • [30] ASCRClu: an adaptive subspace combination and reduction algorithm for clustering of high-dimensional data
    Fatehi, Kavan
    Rezvani, Mohsen
    Fateh, Mansoor
    [J]. PATTERN ANALYSIS AND APPLICATIONS, 2020, 23 (04) : 1651 - 1663