Subspace Clustering for High-Dimensional Data Using Cluster Structure Similarity

被引：3

作者：

Fatehi, Kavan ^{[1
]}

Rezvani, Mohsen ^{[2
]}

Fateh, Mansoor ^{[3
]}

Pajoohan, Mohammad-Reza ^{[4
]}

机构：

[1] Yazd Univ, Dept Comp Engn, Informat Proc & Knowledge Discovery Lab, Yazd, Iran

[2] Shahrood Univ Technol, Sch Comp Engn, Dept Comp Engn, Shahrood, Iran

[3] Shahrood Univ Technol, Dept Comp Engn, Shahrood, Iran

[4] Yazd Univ, Dept Comp Engn, Yazd, Iran

来源：

INTERNATIONAL JOURNAL OF INTELLIGENT INFORMATION TECHNOLOGIES | 2018年 / 14卷 / 03期

关键词：

Algorithm; Cluster Similarity; High Dimensional Data; Subspace Clustering;

D O I：

10.4018/IJIIT.2018070103

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This article describes how recently, because of the curse of dimensionality in high dimensional data, a significant amount of research has been conducted on subspace clustering aiming at discovering clusters embedded in any possible attributes combination. The main goal of subspace clustering algorithms is to find all clusters in all subspaces. Previous studies have mostly been generating redundant subspace clusters, leading to clustering accuracy loss and also increasing the running time of the algorithms. A bottom-up density-based approach is suggested in this article, in which the cluster structure serves as a similarity measure to generate the optimal subspaces which result in raising the accuracy of the subspace clustering. Based on this idea, the algorithm discovers similar subspaces by considering similarity in their cluster structure, then combines them and the data in the new subspaces would be clustered again. Finally, the algorithm determines all the subspaces and also finds all clusters within them. Experiments on various synthetic and real datasets show that the results of the proposed approach are significantly better in quality and runtime than the state-of-theart on clustering high-dimensional data.

引用

页码：38 / 55

页数：18

共 50 条

[1] Subspace selection for clustering high-dimensional data
Baumgartner, C
Plant, C
Kailing, K
Kriegel, HP
Kröger, P
[J]. FOURTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2004, : 11 - 18
[2] Density Conscious Subspace Clustering for High-Dimensional Data
Chu, Yi-Hong
Huang, Jen-Wei
Chuang, Kun-Ta
Yang, De-Nian
Chen, Ming-Syan
[J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2010, 22 (01) : 16 - 30
[3] Subspace Clustering of High-Dimensional Data: An Evolutionary Approach
Vijendra, Singh
Laxman, Sahoo
[J]. APPLIED COMPUTATIONAL INTELLIGENCE AND SOFT COMPUTING, 2013, 2013
[4] Subspace clustering of high-dimensional data: a predictive approach
Brian McWilliams
Giovanni Montana
[J]. Data Mining and Knowledge Discovery, 2014, 28 : 736 - 772
[5] Evolutionary Subspace Clustering Algorithm for High-Dimensional Data
Nourashrafeddin, S. N.
Arnold, Dirk V.
Milios, Evangelos
[J]. PROCEEDINGS OF THE FOURTEENTH INTERNATIONAL CONFERENCE ON GENETIC AND EVOLUTIONARY COMPUTATION COMPANION (GECCO'12), 2012, : 1497 - 1498
[6] Subspace clustering of high-dimensional data: a predictive approach
McWilliams, Brian
Montana, Giovanni
[J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2014, 28 (03) : 736 - 772
[7] Subspace Clustering of Very Sparse High-Dimensional Data
Peng, Hankui
Pavlidis, Nicos
Eckley, Idris
Tsalamanis, Ioannis
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 3780 - 3783
[8] Cluster Validation for Subspace Clustering on High Dimensional Data
Chen, Lifei
Jiang, Qingshan
Wang, Shengrui
[J]. 2008 IEEE ASIA PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS (APCCAS 2008), VOLS 1-4, 2008, : 225 - +
[9] A generic framework for efficient subspace clustering of high-dimensional data
Kriegel, HP
Kröger, P
Renz, M
Wurst, S
[J]. Fifth IEEE International Conference on Data Mining, Proceedings, 2005, : 250 - 257
[10] A Survey on High-Dimensional Subspace Clustering
Qu, Wentao
Xiu, Xianchao
Chen, Huangyue
Kong, Lingchen
[J]. MATHEMATICS, 2023, 11 (02)

← 1 2 3 4 5 →