ASCRClu: an adaptive subspace combination and reduction algorithm for clustering of high-dimensional data

被引:5
|
作者
Fatehi, Kavan [1 ]
Rezvani, Mohsen [2 ]
Fateh, Mansoor [2 ]
机构
[1] Yazd Univ, Yazd, Iran
[2] Shahrood Univ Technol, Shahrood, Iran
关键词
High-dimensional data; Subspace clustering; Cluster similarity; DENSITY;
D O I
10.1007/s10044-020-00884-7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The curse of dimensionality in high-dimensional data is one of the major challenges in data clustering. Recently, a considerable amount of literature has been published on subspace clustering to address this challenge. The main objective of the subspace clustering is to discover clusters embedded in any possible combination of the attributes. Previous studies have mostly been generating redundant subspace clusters, leading to clustering accuracy loss and also increasing the running time. In this paper, a bottom-up density-based approach is proposed for clustering of high-dimensional data. We employ the cluster structure as a similarity measure to generate the optimal subspaces which result in raising the accuracy of the subspace clustering. Using this idea, we propose an iterative algorithm to discover similar subspaces using the similarity in the features of subspaces. At each iteration of this algorithm, it first determines similar subspaces, then combines them to generate higher-dimensional subspaces, and finally re-clusters the subspaces. The algorithm repeats these steps and converges to the final clusters. Experiments on various synthetic and real datasets show that the results of the proposed approach are significantly better in both quality and runtime comparing to the state of the art on clustering high-dimensional data. The accuracy of the proposed method is around 34% higher than the CLIQUE algorithm and around 6% higher than DiSH.
引用
收藏
页码:1651 / 1663
页数:13
相关论文
共 50 条
  • [41] Clustering High-Dimensional Data: A Survey on Subspace Clustering, Pattern-Based Clustering, and Correlation Clustering
    Kriegel, Hans-Peter
    Kroeger, Peer
    Zimek, Arthur
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2009, 3 (01)
  • [42] PARTCAT: A subspace clustering algorithm for high dimensional categorical data
    Gan, Guojun
    Wu, Jianhong
    Yang, Zijiang
    2006 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORK PROCEEDINGS, VOLS 1-10, 2006, : 4406 - +
  • [43] Adaptive Dimensionality Reduction Method for High-dimensional Data
    Duan, Shuyong
    Yang, Jianhua
    Han, Xu
    Liu, Guirong
    Jixie Gongcheng Xuebao/Journal of Mechanical Engineering, 2024, 60 (17): : 283 - 296
  • [44] High-dimensional data clustering
    Bouveyron, C.
    Girard, S.
    Schmid, C.
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2007, 52 (01) : 502 - 519
  • [45] Clustering High-Dimensional Data
    Masulli, Francesco
    Rovetta, Stefano
    CLUSTERING HIGH-DIMENSIONAL DATA, CHDD 2012, 2015, 7627 : 1 - 13
  • [46] Subspace clustering of high dimensional data
    Domeniconi, C
    Papadopoulos, D
    Gunopulos, D
    Ma, S
    Proceedings of the Fourth SIAM International Conference on Data Mining, 2004, : 517 - 521
  • [47] A Hybrid EA for High-dimensional Subspace Clustering Problem
    Lin, Lin
    Gen, Mitsuo
    Liang, Yan
    2014 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2014, : 2855 - 2860
  • [48] Persistent homology based clustering algorithm for high-dimensional data
    Xiong Z.
    Wei Y.
    Xiong Z.
    He K.
    Huazhong Keji Daxue Xuebao (Ziran Kexue Ban)/Journal of Huazhong University of Science and Technology (Natural Science Edition), 2024, 52 (02): : 29 - 35
  • [49] A Clustering Algorithm for High-Dimensional Nonlinear Feature Data with Applications
    Jiang H.
    Wang G.
    Gao J.
    Gao Z.
    Gao R.
    Guo Q.
    Hsi-An Chiao Tung Ta Hsueh/Journal of Xi'an Jiaotong University, 2017, 51 (12): : 49 - 55and90
  • [50] Adaptive dimension reduction for clustering high dimensional data
    Ding, C
    He, XF
    Zha, HY
    Simon, HD
    2002 IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2002, : 147 - 154