Categorical Data Clustering with Automatic Selection of Cluster Number

被引:9
|
作者
Liao, Hai-Yong [1 ,2 ]
Ng, Michael K. [1 ,2 ]
机构
[1] Hong Kong Baptist Univ, Ctr Math Imaging & Vis, Kowloon Tong, Hong Kong, Peoples R China
[2] Hong Kong Baptist Univ, Dept Math, Kowloon Tong, Hong Kong, Peoples R China
关键词
Categorial data; Clustering; Penalty; Regularization parameter;
D O I
10.1007/s12543-009-0001-5
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
In this paper, we investigate the problem of determining the number of clusters in the k-modes based categorical data clustering process. We propose a new categorical data clustering algorithm with automatic selection of k. The new algorithm extends the k-modes clustering algorithm by introducing a penalty term to the objective function to make more clusters compete for objects. In the new objective function, we employ a regularization parameter to control the number of clusters in a clustering process. Instead of finding k directly, we choose a suitable value of regularization parameter such that the corresponding clustering result is the most stable one among all the generated clustering results. Experimental results on synthetic data sets and the real data sets are used to demonstrate the effectiveness of the proposed algorithm.
引用
收藏
页码:5 / 25
页数:21
相关论文
共 50 条
  • [31] Subtractive Clustering for Categorical Data
    Gu, Lei
    2016 12TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (ICNC-FSKD), 2016, : 1229 - 1232
  • [32] Evaluation of Categorical Data Clustering
    Rezankova, Hana
    Loster, Tomas
    Husek, Dusan
    ADVANCES IN INTELLIGENT WEB MASTERING 3, 2011, 86 : 173 - 182
  • [33] Clustering Categorical Data: A Survey
    Naouali, Sami
    Ben Salem, Semeh
    Chtourou, Zied
    INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY & DECISION MAKING, 2020, 19 (01) : 49 - 96
  • [34] Identifying cluster number for subspace projected functional data clustering
    Li, Pai-Ling
    Chiou, Jeng-Min
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2011, 55 (06) : 2090 - 2103
  • [35] Automatic selection of the number of clusters in multidimensional data problems
    Marazzi, A
    Gamba, P
    Mecocci, A
    Semboloni, A
    INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, PROCEEDINGS - VOL III, 1996, : 631 - 634
  • [36] Data Labeling method based on Cluster Purity using Relative Rough Entropy for Categorical Data Clustering
    Reddy, H. Venkateswara
    Raju, S. Viswanadha
    Agrawal, Pratibha
    2013 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2013, : 500 - 506
  • [37] A data labeling method for clustering categorical data
    Cao, Fuyuan
    Liang, Jiye
    EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (03) : 2381 - 2385
  • [38] Data Reduction Method for Categorical Data Clustering
    Rendon, Erendira
    Salvador Sanchez, J.
    Garcia, Rene A.
    Abundez, Itzel
    Gutierrez, Citlalih
    Gasca, Eduardo
    ADVANCES IN ARTIFICIAL INTELLIGENCE - IBERAMIA 2008, PROCEEDINGS, 2008, 5290 : 143 - +
  • [39] Data and cluster weighting in target selection based on fuzzy clustering
    Kaymak, U
    FUZZY SETS AND SYSTEMS - IFSA 2003, PROCEEDINGS, 2003, 2715 : 568 - 575
  • [40] Video shot spectral clustering algorithm by optimized automatic cluster model selection
    Zhang, Jianning
    Sun, Lifeng
    Zhong, Yuzhuo
    Qinghua Daxue Xuebao/Journal of Tsinghua University, 2007, 47 (10): : 1700 - 1703