Mining Projected Clusters in High-Dimensional Spaces

被引:30
|
作者
Bouguessa, Mohamed [1 ]
Wang, Shengrui [1 ]
机构
[1] Univ Sherbrooke, Dept Comp Sci, Sherbrooke, PQ J1K 2R1, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Data mining; clustering; high dimensions; ALGORITHMS;
D O I
10.1109/TKDE.2008.162
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering high-dimensional data has been a major challenge due to the inherent sparsity of the points. Most existing clustering algorithms become substantially inefficient if the required similarity measure is computed between data points in the full-dimensional space. To address this problem, a number of projected clustering algorithms have been proposed. However, most of them encounter difficulties when clusters hide in subspaces with very low dimensionality. These challenges motivate our effort to propose a robust partitional distance-based projected clustering algorithm. The algorithm consists of three phases. The first phase performs attribute relevance analysis by detecting dense and sparse regions and their location in each attribute. Starting from the results of the first phase, the goal of the second phase is to eliminate outliers, while the third phase aims to discover clusters in different subspaces. The clustering process is based on the K-Means algorithm, with the computation of distance restricted to subsets of attributes where object values are dense. Our algorithm is capable of detecting projected clusters of low dimensionality embedded in a high-dimensional space and avoids the computation of the distance in the full-dimensional space. The suitability of our proposal has been demonstrated through an empirical study using synthetic and real data sets.
引用
收藏
页码:507 / 522
页数:16
相关论文
共 50 条
  • [1] Finding generalized projected clusters in high dimensional spaces
    Aggarwal, CC
    Yu, PS
    [J]. SIGMOD RECORD, 2000, 29 (02) : 70 - 81
  • [2] Integration of projected clusters and principal axis trees for high-dimensional data indexing and query
    Wang, B
    Gan, JQ
    [J]. INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING IDEAL 2004, PROCEEDINGS, 2004, 3177 : 191 - 196
  • [3] EM in high-dimensional spaces
    Draper, BA
    Elliott, DL
    Hayes, J
    Baek, K
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2005, 35 (03): : 571 - 577
  • [4] Projected tests for high-dimensional covariance matrices
    Wu, Tung-Lung
    Li, Ping
    [J]. JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2020, 207 : 73 - 85
  • [5] The mathematics of high-dimensional spaces
    Rogers, D
    [J]. ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 1998, 215 : U524 - U524
  • [6] PCFA: Mining of Projected Clusters in High Dimensional Data Using Modified FCM Algorithm
    Murugappan, Ilango
    Vasudev, Mohan
    [J]. INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2014, 11 (02) : 168 - 177
  • [7] Detecting Projected Outliers in High-Dimensional Data Streams
    Zhang, Ji
    Gao, Qigang
    Wang, Hai
    Liu, Qing
    Xu, Kai
    [J]. DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2009, 5690 : 629 - +
  • [8] Generalized projected clustering in high-dimensional data streams
    Wang, T
    [J]. FRONTIERS OF WWW RESEARCH AND DEVELOPMENT - APWEB 2006, PROCEEDINGS, 2006, 3841 : 772 - 778
  • [9] Containment problems in high-dimensional spaces
    Ishigami, Y
    [J]. GRAPHS AND COMBINATORICS, 1995, 11 (04) : 327 - 335
  • [10] Clustering in high-dimensional data spaces
    Murtagh, FD
    [J]. STATISTICAL CHALLENGES IN ASTRONOMY, 2003, : 279 - 292