Redefining clustering for high-dimensional applications

被引:56
|
作者
Aggarwal, CC [1 ]
Yu, PS [1 ]
机构
[1] IBM Corp, Thomas J Watson Res Ctr, Yorktown Hts, NY 10598 USA
关键词
data mining; clustering; high dimensions; dimensionality curse;
D O I
10.1109/69.991713
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering problems are well-known in the database literature for their use in numerous applications, such as customer segmentation, classification, and trend analysis. High-dimensional data has always been a challenge for clustering algorithms because of the inherent sparsity of the points. Recent research results indicate that, in high-dimensional data, even the concept of proximity or clustering may not be meaningful. We introduce a very general concept of projected clustering which is able to construct clusters in arbitrarily aligned subspaces of lower dimensionality. The subspaces are specific to the clusters themselves. This definition is substantially more general and realistic than the currently available techniques which limit the method to only projections from the original set of attributes. The generalized projected clustering technique may also be viewed as a way of trying to redefine clustering for high-dimensional applications by searching for hidden subspaces with clusters which are created by interattribute correlations. We provide a new concept of using extended cluster feature vectors in order to make the algorithm scalable for very large databases. The running time and space requirements of the algorithm are adjustable and are likely to trade-off with better accuracy.
引用
收藏
页码:210 / 225
页数:16
相关论文
共 50 条
  • [1] A Clustering Algorithm for High-Dimensional Nonlinear Feature Data with Applications
    Jiang, Hongquan
    Wang, Gang
    Gao, Jianmin
    Gao, Zhiyong
    Gao, Ruiqi
    Guo, Qi
    [J]. Hsi-An Chiao Tung Ta Hsueh/Journal of Xi'an Jiaotong University, 2017, 51 (12): : 49 - 55
  • [2] High-dimensional data clustering
    Bouveyron, C.
    Girard, S.
    Schmid, C.
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2007, 52 (01) : 502 - 519
  • [3] Clustering High-Dimensional Data
    Masulli, Francesco
    Rovetta, Stefano
    [J]. CLUSTERING HIGH-DIMENSIONAL DATA, CHDD 2012, 2015, 7627 : 1 - 13
  • [4] Clustering of high-dimensional observations
    Wang, Yong
    Modarres, Reza
    [J]. JOURNAL OF NONPARAMETRIC STATISTICS, 2024,
  • [5] Model-based multifacet clustering with high-dimensional omics applications
    Zong, Wei
    Li, Danyang
    Seney, Marianne L.
    Mcclung, Colleen A.
    Tseng, George C.
    [J]. BIOSTATISTICS, 2024,
  • [6] Redefining nearest neighbor classification in high-dimensional settings
    Lopez, Julio
    Maldonado, Sebastian
    [J]. PATTERN RECOGNITION LETTERS, 2018, 110 : 36 - 43
  • [7] On the optimality of kernels for high-dimensional clustering
    Vankadara, Leena Chennuru
    Ghoshdastidar, Debarghya
    [J]. INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108
  • [8] Subquadratic High-Dimensional Hierarchical Clustering
    Abboud, Amir
    Cohen-Addad, Vincent
    Houdrouge, Hussein
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [9] Clustering in high-dimensional data spaces
    Murtagh, FD
    [J]. STATISTICAL CHALLENGES IN ASTRONOMY, 2003, : 279 - 292
  • [10] Clustering of High-Dimensional and Correlated Data
    McLachlan, Geoffrey J.
    Ng, Shu-Kay
    Wang, K.
    [J]. DATA ANALYSIS AND CLASSIFICATION, 2010, : 3 - 11