Categorical Data Clustering with Automatic Selection of Cluster Number

被引:9
|
作者
Liao, Hai-Yong [1 ,2 ]
Ng, Michael K. [1 ,2 ]
机构
[1] Hong Kong Baptist Univ, Ctr Math Imaging & Vis, Kowloon Tong, Hong Kong, Peoples R China
[2] Hong Kong Baptist Univ, Dept Math, Kowloon Tong, Hong Kong, Peoples R China
关键词
Categorial data; Clustering; Penalty; Regularization parameter;
D O I
10.1007/s12543-009-0001-5
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
In this paper, we investigate the problem of determining the number of clusters in the k-modes based categorical data clustering process. We propose a new categorical data clustering algorithm with automatic selection of k. The new algorithm extends the k-modes clustering algorithm by introducing a penalty term to the objective function to make more clusters compete for objects. In the new objective function, we employ a regularization parameter to control the number of clusters in a clustering process. Instead of finding k directly, we choose a suitable value of regularization parameter such that the corresponding clustering result is the most stable one among all the generated clustering results. Experimental results on synthetic data sets and the real data sets are used to demonstrate the effectiveness of the proposed algorithm.
引用
收藏
页码:5 / 25
页数:21
相关论文
共 50 条
  • [1] Clustering Algorithms with Automatic Selection of Cluster Number
    Ng, Michael
    2008 IEEE INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING, VOLS 1 AND 2, 2008, : 11 - 11
  • [2] Medoid Silhouette clustering with automatic cluster number selection
    Lenssen, Lars
    Schubert, Erich
    INFORMATION SYSTEMS, 2024, 120
  • [3] Clustering Fusion with Automatic Cluster Number
    Muneeswaran, P.
    Velvizhy, P.
    Kannan, A.
    2014 INTERNATIONAL CONFERENCE ON RECENT TRENDS IN INFORMATION TECHNOLOGY (ICRTIT), 2014,
  • [4] Clustering Categorical Data:A Cluster Ensemble Approach
    何增友
    High Technology Letters, 2003, (04) : 8 - 12
  • [5] On rival penalization controlled competitive learning for clustering with automatic cluster number selection
    Cheung, YM
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (11) : 1583 - 1588
  • [6] Clustering and variable selection for categorical multivariate data
    Bontemps, Dominique
    Toussile, Wilson
    ELECTRONIC JOURNAL OF STATISTICS, 2013, 7 : 2344 - 2371
  • [7] An initialization method to simultaneously find initial cluster centers and the number of clusters for clustering categorical data
    Bai, Liang
    Liang, Jiye
    Dang, Chuangyin
    KNOWLEDGE-BASED SYSTEMS, 2011, 24 (06) : 785 - 795
  • [8] A cluster centers initialization method for clustering categorical data
    Bai, Liang
    Liang, Jiye
    Dang, Chuangyin
    Cao, Fuyuan
    EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (09) : 8022 - 8029
  • [9] Categorical Data Clustering Based on Cluster Ensemble Process
    Veeraiah, D.
    Vasumathi, D.
    PROCEEDINGS OF THE INTERNATIONAL CONGRESS ON INFORMATION AND COMMUNICATION TECHNOLOGY, ICICT 2015, VOL 2, 2016, 439 : 101 - 111
  • [10] Parameterized Complexity of Feature Selection for Categorical Data Clustering
    Bandyapadhyay, Sayan
    Fomin, Fedor V.
    Golovach, Petr A.
    Simonov, Kirill
    ACM TRANSACTIONS ON COMPUTATION THEORY, 2023, 15 (3-4)