An initialization method to simultaneously find initial cluster centers and the number of clusters for clustering categorical data

被引:57
|
作者
Bai, Liang [1 ,2 ]
Liang, Jiye [1 ]
Dang, Chuangyin [2 ]
机构
[1] Shanxi Univ, Key Lab Computat Intelligence & Chinese Informat, Minist Educ, Sch Comp & Informat Technol, Taiyuan 030006, Shanxi, Peoples R China
[2] City Univ Hong Kong, Dept Mfg Engn & Engn Management, Hong Kong, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
The k-modes-type algorithms; Categorical data; Initial cluster centers; The number of clusters; Density measure; K-MEANS ALGORITHM; GENETIC ALGORITHM; APPROXIMATION; REDUCTION;
D O I
10.1016/j.knosys.2011.02.015
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The leading partitional clustering technique, k-modes, is one of the most computationally efficient clustering methods for categorical data. However, in the k-modes-type algorithms, the performance of their clustering depends on initial cluster centers and the number of clusters needs be known or given in advance. This paper proposes a novel initialization method for categorical data which is implemented to the k-modes-type algorithms. The proposed method can not only obtain the good initial cluster centers but also provide a criterion to find candidates for the number of clusters. The performance and scalability of the proposed method has been studied on real data sets. The experimental results illustrate that the proposed method is effective and can be applied to large data sets for its linear time complexity with respect to the number of data points. (C) 2011 Elsevier B.V. All rights reserved.
引用
收藏
页码:785 / 795
页数:11
相关论文
共 50 条
  • [1] A cluster centers initialization method for clustering categorical data
    Bai, Liang
    Liang, Jiye
    Dang, Chuangyin
    Cao, Fuyuan
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (09) : 8022 - 8029
  • [2] A new initialization method for clustering categorical data
    Wu, Shu
    Jiang, Qingshan
    Huang, Joshua Zhexue
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2007, 4426 : 972 - +
  • [3] A new initialization method for categorical data clustering
    Cao, Fuyuan
    Liang, Jiye
    Bai, Liang
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (07) : 10223 - 10228
  • [4] Categorical Data Clustering with Automatic Selection of Cluster Number
    Liao, Hai-Yong
    Ng, Michael K.
    [J]. FUZZY INFORMATION AND ENGINEERING, 2009, 1 (01) : 5 - 25
  • [5] k-PbC: an improved cluster center initialization for categorical data clustering
    Duy-Tai Dinh
    Van-Nam Huynh
    [J]. Applied Intelligence, 2020, 50 : 2610 - 2632
  • [6] k-PbC: an improved cluster center initialization for categorical data clustering
    Duy-Tai Dinh
    Van-Nam Huynh
    [J]. APPLIED INTELLIGENCE, 2020, 50 (08) : 2610 - 2632
  • [7] Subspace Clustering of Categorical and Numerical Data With an Unknown Number of Clusters
    Jia, Hong
    Cheung, Yiu-Ming
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (08) : 3308 - 3325
  • [8] Cluster Validation Method for Determining the Number of Clusters in Categorical Sequences
    Guo, Gongde
    Chen, Lifei
    Ye, Yanfang
    Jiang, Qingshan
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2017, 28 (12) : 2936 - 2948
  • [9] A method of dynamically determining the number of clusters and cluster centers
    Shao Xiongkai
    Pi Ling
    Liu Lianzhou
    [J]. PROCEEDINGS OF THE 2013 8TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE & EDUCATION (ICCSE 2013), 2013, : 283 - 286
  • [10] Estimating the Optimal Number of Clusters in Categorical Data Clustering by Silhouette Coefficient
    Dinh, Duy-Tai
    Fujinami, Tsutomu
    Huynh, Van-Nam
    [J]. KNOWLEDGE AND SYSTEMS SCIENCES, KSS 2019, 2019, 1103 : 1 - 17