Subspace clustering of high-dimensional data: a predictive approach

被引:56
|
作者
McWilliams, Brian [2 ]
Montana, Giovanni [1 ]
机构
[1] Univ London Imperial Coll Sci Technol & Med, Dept Math, London, England
[2] ETH, Dept Informat, Zurich, Switzerland
基金
英国工程与自然科学研究理事会;
关键词
Subspace clustering; PCA; PRESS statistics; Variable selection; Model selection; Microarrays; CROSS-VALIDATION; CLASS DISCOVERY; CLASSIFICATION; MIXTURES; CANCER;
D O I
10.1007/s10618-013-0317-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In several application domains, high-dimensional observations are collected and then analysed in search for naturally occurring data clusters which might provide further insights about the nature of the problem. In this paper we describe a new approach for partitioning such high-dimensional data. Our assumption is that, within each cluster, the data can be approximated well by a linear subspace estimated by means of a principal component analysis (PCA). The proposed algorithm, Predictive Subspace Clustering (PSC) partitions the data into clusters while simultaneously estimating cluster-wise PCA parameters. The algorithm minimises an objective function that depends upon a new measure of influence for PCA models. A penalised version of the algorithm is also described for carrying our simultaneous subspace clustering and variable selection. The convergence of PSC is discussed in detail, and extensive simulation results and comparisons to competing methods are presented. The comparative performance of PSC has been assessed on six real gene expression data sets for which PSC often provides state-of-art results.
引用
收藏
页码:736 / 772
页数:37
相关论文
共 50 条
  • [1] Subspace clustering of high-dimensional data: a predictive approach
    Brian McWilliams
    Giovanni Montana
    [J]. Data Mining and Knowledge Discovery, 2014, 28 : 736 - 772
  • [2] Subspace Clustering of High-Dimensional Data: An Evolutionary Approach
    Vijendra, Singh
    Laxman, Sahoo
    [J]. APPLIED COMPUTATIONAL INTELLIGENCE AND SOFT COMPUTING, 2013, 2013
  • [3] Subspace selection for clustering high-dimensional data
    Baumgartner, C
    Plant, C
    Kailing, K
    Kriegel, HP
    Kröger, P
    [J]. FOURTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2004, : 11 - 18
  • [4] Evolutionary Subspace Clustering Algorithm for High-Dimensional Data
    Nourashrafeddin, S. N.
    Arnold, Dirk V.
    Milios, Evangelos
    [J]. PROCEEDINGS OF THE FOURTEENTH INTERNATIONAL CONFERENCE ON GENETIC AND EVOLUTIONARY COMPUTATION COMPANION (GECCO'12), 2012, : 1497 - 1498
  • [5] Density Conscious Subspace Clustering for High-Dimensional Data
    Chu, Yi-Hong
    Huang, Jen-Wei
    Chuang, Kun-Ta
    Yang, De-Nian
    Chen, Ming-Syan
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2010, 22 (01) : 16 - 30
  • [6] Subspace Clustering of Very Sparse High-Dimensional Data
    Peng, Hankui
    Pavlidis, Nicos
    Eckley, Idris
    Tsalamanis, Ioannis
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 3780 - 3783
  • [7] A generic framework for efficient subspace clustering of high-dimensional data
    Kriegel, HP
    Kröger, P
    Renz, M
    Wurst, S
    [J]. Fifth IEEE International Conference on Data Mining, Proceedings, 2005, : 250 - 257
  • [8] A Survey on High-Dimensional Subspace Clustering
    Qu, Wentao
    Xiu, Xianchao
    Chen, Huangyue
    Kong, Lingchen
    [J]. MATHEMATICS, 2023, 11 (02)
  • [9] Density-connected subspace clustering for high-dimensional data
    Kailing, K
    Kriegel, HP
    Kröger, P
    [J]. PROCEEDINGS OF THE FOURTH SIAM INTERNATIONAL CONFERENCE ON DATA MINING, 2004, : 246 - 256
  • [10] Subspace-Weighted Consensus Clustering for High-Dimensional Data
    Cai, Xiaosha
    Huang, Dong
    [J]. ADVANCED DATA MINING AND APPLICATIONS, 2020, 12447 : 3 - 16