DHCC: Divisive hierarchical clustering of categorical data

被引:54
|
作者
Xiong, Tengke [1 ]
Wang, Shengrui [1 ]
Mayers, Andre [1 ]
Monga, Ernest [2 ]
机构
[1] Univ Sherbrooke, Dept Comp Sci, Sherbrooke, PQ J1K 2R1, Canada
[2] Univ Sherbrooke, Dept Math, Sherbrooke, PQ J1K 2R1, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Categorical data; Divisive hierarchical clustering; Subspace clustering; ALGORITHMS; NUMBER;
D O I
10.1007/s10618-011-0221-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering categorical data poses two challenges defining an inherently meaningful similarity measure, and effectively dealing with clusters which are often embedded in different subspaces. In this paper, we propose a novel divisive hierarchical clustering algorithm for categorical data, named DHCC. We view the task of clustering categorical data from an optimization perspective, and propose effective procedures to initialize and refine the splitting of clusters. The initialization of the splitting is based on multiple correspondence analysis (MCA). We also devise a strategy for deciding when to terminate the splitting process. The proposed algorithm has five merits. First, due to its hierarchical nature, our algorithm yields a dendrogram representing nested groupings of patterns and similarity levels at different granularities. Second, it is parameter-free, fully automatic and, in particular, requires no assumption regarding the number of clusters. Third, it is independent of the order in which the data is processed. Fourth, it is scalable to large data sets. And finally, our algorithm is capable of seamlessly discovering clusters embedded in subspaces, thanks to its use of a novel data representation and Chi-square dissimilarity measures. Experiments on both synthetic and real data demonstrate the superior performance of our algorithm.
引用
收藏
页码:103 / 135
页数:33
相关论文
共 50 条
  • [1] DHCC: Divisive hierarchical clustering of categorical data
    Tengke Xiong
    Shengrui Wang
    André Mayers
    Ernest Monga
    [J]. Data Mining and Knowledge Discovery, 2012, 24 : 103 - 135
  • [2] MGR: An information theory based hierarchical divisive clustering algorithm for categorical data
    Qin, Hongwu
    Ma, Xiuqin
    Herawan, Tutut
    Zain, Jasni Mohamad
    [J]. KNOWLEDGE-BASED SYSTEMS, 2014, 67 : 401 - 411
  • [3] Semi-supervised Parameter-Free Divisive Hierarchical Clustering of Categorical Data
    Xiong, Tengke
    Wang, Shengrui
    Mayers, Andre
    Monga, Ernest
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT I: 15TH PACIFIC-ASIA CONFERENCE, PAKDD 2011, 2011, 6634 : 265 - 276
  • [4] Ordering of categorical data in hierarchical clustering
    Kazimianec, Michail
    [J]. DATABASES AND INFORMATION SYSTEMS, 2008, : 401 - 404
  • [5] Hierarchical division clustering framework for categorical data
    Wei, Wei
    Liang, Jiye
    Guo, Xinyao
    Song, Peng
    Sun, Yijun
    [J]. NEUROCOMPUTING, 2019, 341 : 118 - 134
  • [6] A hierarchical clustering algorithm for categorical sequence data
    Oh, SJ
    Kim, JY
    [J]. INFORMATION PROCESSING LETTERS, 2004, 91 (03) : 135 - 140
  • [7] A subspace hierarchical clustering algorithm for categorical data
    Carbonera, Joel Luis
    Abel, Mara
    [J]. 2019 IEEE 31ST INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2019), 2019, : 509 - 516
  • [8] Parallel Hierarchical Subspace Clustering of Categorical Data
    Pang, Ning
    Zhang, Jifu
    Zhang, Chaowei
    Qin, Xiao
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 2019, 68 (04) : 542 - 555
  • [9] A New MCA-based Divisive Hierarchical Algorithm for Clusteing Categorical Data
    Xiong, Tengke
    Wang, Shengrui
    Mayers, Andre
    Monga, Ernest
    [J]. 2009 9TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, 2009, : 1058 - +
  • [10] Agglomerative and divisive hierarchical Bayesian clustering
    Burghardt, Elliot
    Sewell, Daniel
    Cavanaugh, Joseph
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2022, 176