Categorical Data Clustering with Automatic Selection of Cluster Number

被引:9
|
作者
Liao, Hai-Yong [1 ,2 ]
Ng, Michael K. [1 ,2 ]
机构
[1] Hong Kong Baptist Univ, Ctr Math Imaging & Vis, Kowloon Tong, Hong Kong, Peoples R China
[2] Hong Kong Baptist Univ, Dept Math, Kowloon Tong, Hong Kong, Peoples R China
关键词
Categorial data; Clustering; Penalty; Regularization parameter;
D O I
10.1007/s12543-009-0001-5
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
In this paper, we investigate the problem of determining the number of clusters in the k-modes based categorical data clustering process. We propose a new categorical data clustering algorithm with automatic selection of k. The new algorithm extends the k-modes clustering algorithm by introducing a penalty term to the objective function to make more clusters compete for objects. In the new objective function, we employ a regularization parameter to control the number of clusters in a clustering process. Instead of finding k directly, we choose a suitable value of regularization parameter such that the corresponding clustering result is the most stable one among all the generated clustering results. Experimental results on synthetic data sets and the real data sets are used to demonstrate the effectiveness of the proposed algorithm.
引用
收藏
页码:5 / 25
页数:21
相关论文
共 50 条
  • [21] A Data Labeling method for Categorical Data Clustering using Cluster Entropies in Rough Sets
    Reddy, H. Venkateswara
    Kumar, B. Suresh
    Raju, S. Viswanadha
    2014 FOURTH INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS AND NETWORK TECHNOLOGIES (CSNT), 2014, : 444 - 449
  • [22] Weighted Delta Factor Cluster Ensemble Algorithm for Categorical Data Clustering in Data Mining
    Sengottaian, Sarumathi
    Natesan, Shanthi
    Mathivanan, Sharmila
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2017, 14 (03) : 275 - 284
  • [23] NSS-AKmeans: An Agglomerative Fuzzy K-Means Clustering Method with Automatic Selection of Cluster Number
    Zhang, Yanfeng
    Xu, Xiaofei
    Ye, Yunming
    2ND IEEE INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER CONTROL (ICACC 2010), VOL. 2, 2010, : 32 - 38
  • [24] Clustering categorical data using Qualified Nearest Neighbors Selection model
    Jin, Yang
    Zuo, Wanli
    AI 2006: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2006, 4304 : 1037 - +
  • [25] Clustering ensemble selection for categorical data based on internal validity indices
    Zhao, Xingwang
    Liang, Jiye
    Dang, Chuangyin
    PATTERN RECOGNITION, 2017, 69 : 150 - 168
  • [26] k-PbC: an improved cluster center initialization for categorical data clustering
    Duy-Tai Dinh
    Van-Nam Huynh
    APPLIED INTELLIGENCE, 2020, 50 (08) : 2610 - 2632
  • [27] k-PbC: an improved cluster center initialization for categorical data clustering
    Duy-Tai Dinh
    Van-Nam Huynh
    Applied Intelligence, 2020, 50 : 2610 - 2632
  • [28] A novel fuzzy clustering algorithm with between-cluster information for categorical data
    Bai, Liang
    Liang, Jiye
    Dang, Chuangyin
    Cao, Fuyuan
    FUZZY SETS AND SYSTEMS, 2013, 215 : 55 - 73
  • [29] On data labeling for clustering categorical data
    Chen, Hung-Leng
    Chuang, Kun-Ta
    Chen, Ming-Syan
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2008, 20 (11) : 1458 - 1471
  • [30] Clustering categorical data streams
    He, Zengyou
    Xu, Xiaofei
    Deng, Shengchun
    Huang, Joshua Zhexue
    JOURNAL OF COMPUTATIONAL METHODS IN SCIENCES AND ENGINEERING, 2011, 11 (04) : 185 - 192