Subspace Clustering for High-Dimensional Data Using Cluster Structure Similarity

被引:3
|
作者
Fatehi, Kavan [1 ]
Rezvani, Mohsen [2 ]
Fateh, Mansoor [3 ]
Pajoohan, Mohammad-Reza [4 ]
机构
[1] Yazd Univ, Dept Comp Engn, Informat Proc & Knowledge Discovery Lab, Yazd, Iran
[2] Shahrood Univ Technol, Sch Comp Engn, Dept Comp Engn, Shahrood, Iran
[3] Shahrood Univ Technol, Dept Comp Engn, Shahrood, Iran
[4] Yazd Univ, Dept Comp Engn, Yazd, Iran
关键词
Algorithm; Cluster Similarity; High Dimensional Data; Subspace Clustering;
D O I
10.4018/IJIIT.2018070103
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This article describes how recently, because of the curse of dimensionality in high dimensional data, a significant amount of research has been conducted on subspace clustering aiming at discovering clusters embedded in any possible attributes combination. The main goal of subspace clustering algorithms is to find all clusters in all subspaces. Previous studies have mostly been generating redundant subspace clusters, leading to clustering accuracy loss and also increasing the running time of the algorithms. A bottom-up density-based approach is suggested in this article, in which the cluster structure serves as a similarity measure to generate the optimal subspaces which result in raising the accuracy of the subspace clustering. Based on this idea, the algorithm discovers similar subspaces by considering similarity in their cluster structure, then combines them and the data in the new subspaces would be clustered again. Finally, the algorithm determines all the subspaces and also finds all clusters within them. Experiments on various synthetic and real datasets show that the results of the proposed approach are significantly better in quality and runtime than the state-of-theart on clustering high-dimensional data.
引用
收藏
页码:38 / 55
页数:18
相关论文
共 50 条
  • [1] Subspace selection for clustering high-dimensional data
    Baumgartner, C
    Plant, C
    Kailing, K
    Kriegel, HP
    Kröger, P
    [J]. FOURTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2004, : 11 - 18
  • [2] Density Conscious Subspace Clustering for High-Dimensional Data
    Chu, Yi-Hong
    Huang, Jen-Wei
    Chuang, Kun-Ta
    Yang, De-Nian
    Chen, Ming-Syan
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2010, 22 (01) : 16 - 30
  • [3] Subspace Clustering of High-Dimensional Data: An Evolutionary Approach
    Vijendra, Singh
    Laxman, Sahoo
    [J]. APPLIED COMPUTATIONAL INTELLIGENCE AND SOFT COMPUTING, 2013, 2013
  • [4] Subspace clustering of high-dimensional data: a predictive approach
    Brian McWilliams
    Giovanni Montana
    [J]. Data Mining and Knowledge Discovery, 2014, 28 : 736 - 772
  • [5] Evolutionary Subspace Clustering Algorithm for High-Dimensional Data
    Nourashrafeddin, S. N.
    Arnold, Dirk V.
    Milios, Evangelos
    [J]. PROCEEDINGS OF THE FOURTEENTH INTERNATIONAL CONFERENCE ON GENETIC AND EVOLUTIONARY COMPUTATION COMPANION (GECCO'12), 2012, : 1497 - 1498
  • [6] Subspace clustering of high-dimensional data: a predictive approach
    McWilliams, Brian
    Montana, Giovanni
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2014, 28 (03) : 736 - 772
  • [7] Subspace Clustering of Very Sparse High-Dimensional Data
    Peng, Hankui
    Pavlidis, Nicos
    Eckley, Idris
    Tsalamanis, Ioannis
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 3780 - 3783
  • [8] Cluster Validation for Subspace Clustering on High Dimensional Data
    Chen, Lifei
    Jiang, Qingshan
    Wang, Shengrui
    [J]. 2008 IEEE ASIA PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS (APCCAS 2008), VOLS 1-4, 2008, : 225 - +
  • [9] A generic framework for efficient subspace clustering of high-dimensional data
    Kriegel, HP
    Kröger, P
    Renz, M
    Wurst, S
    [J]. Fifth IEEE International Conference on Data Mining, Proceedings, 2005, : 250 - 257
  • [10] A Survey on High-Dimensional Subspace Clustering
    Qu, Wentao
    Xiu, Xianchao
    Chen, Huangyue
    Kong, Lingchen
    [J]. MATHEMATICS, 2023, 11 (02)