A cluster centers initialization method for clustering categorical data

被引:62
|
作者
Bai, Liang [1 ,2 ]
Liang, Jiye [1 ]
Dang, Chuangyin [2 ]
Cao, Fuyuan [1 ]
机构
[1] Shanxi Univ, Sch Comp & Informat Technol, Minist Educ, Key Lab Computat Intelligence & Chinese Informat, Taiyuan 030006, Shanxi, Peoples R China
[2] City Univ Hong Kong, Dept Mfg Engn & Engn Management, Hong Kong, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
The k-modes algorithm; Initialization method; Initial cluster centers; Density; Distance; GENETIC ALGORITHM;
D O I
10.1016/j.eswa.2012.01.131
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The leading partitional clustering technique, k-modes, is one of the most computationally efficient clustering methods for categorical data. However, the performance of the k-modes clustering algorithm which converges to numerous local minima strongly depends on initial cluster centers. Currently, most methods of initialization cluster centers are mainly for numerical data. Due to lack of geometry for the categorical data, these methods used in cluster centers initialization for numerical data are not applicable to categorical data. This paper proposes a novel initialization method for categorical data which is implemented to the k-modes algorithm. The method integrates the distance and the density together to select initial cluster centers and overcomes shortcomings of the existing initialization methods for categorical data. Experimental results illustrate the proposed initialization method is effective and can be applied to large data sets for its linear time complexity with respect to the number of data objects. (C) 2012 Elsevier Ltd. All rights reserved.
引用
收藏
页码:8022 / 8029
页数:8
相关论文
共 50 条
  • [21] A Link-Based Cluster Ensemble Approach for Categorical Data Clustering
    Iam-On, Natthakan
    Boongoen, Tossapon
    Garrett, Simon
    Price, Chris
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2012, 24 (03) : 413 - 425
  • [22] An Improved Initialization Method for Clustering High-Dimensional Data
    Zhang, Yanping
    Jiang, Qingshan
    2010 2ND INTERNATIONAL WORKSHOP ON DATABASE TECHNOLOGY AND APPLICATIONS PROCEEDINGS (DBTA), 2010,
  • [23] Weighted Delta Factor Cluster Ensemble Algorithm for Categorical Data Clustering in Data Mining
    Sengottaian, Sarumathi
    Natesan, Shanthi
    Mathivanan, Sharmila
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2017, 14 (03) : 275 - 284
  • [24] Data Labeling method based on Rough Entropy for Categorical Data Clustering
    Sreenivasulu, G.
    Raju, S. Viswanadha
    Rao, N. Sambasiva
    2014 INTERNATIONAL CONFERENCE ON ELECTRONICS, COMMUNICATION AND COMPUTATIONAL ENGINEERING (ICECCE), 2014, : 173 - 178
  • [25] A fuzzy hierarchical clustering method for clustering documents based on dynamic cluster centers
    Chen, Shyi-Ming
    Chen, Liang-Yu
    JOURNAL OF THE CHINESE INSTITUTE OF ENGINEERS, 2007, 30 (01) : 169 - 172
  • [26] A novel fuzzy clustering algorithm with between-cluster information for categorical data
    Bai, Liang
    Liang, Jiye
    Dang, Chuangyin
    Cao, Fuyuan
    FUZZY SETS AND SYSTEMS, 2013, 215 : 55 - 73
  • [27] A method for k-means-like clustering of categorical data
    Nguyen T.-H.T.
    Dinh D.-T.
    Sriboonchitta S.
    Huynh V.-N.
    Journal of Ambient Intelligence and Humanized Computing, 2023, 14 (11) : 15011 - 15021
  • [28] Clustering Categorical Data Using a Swarm-based Method
    Izakian, Hesam
    Abraham, Ajith
    Snasel, Vaclav
    2009 WORLD CONGRESS ON NATURE & BIOLOGICALLY INSPIRED COMPUTING (NABIC 2009), 2009, : 1719 - +
  • [29] On data labeling for clustering categorical data
    Chen, Hung-Leng
    Chuang, Kun-Ta
    Chen, Ming-Syan
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2008, 20 (11) : 1458 - 1471
  • [30] A Fast Density Peak Clustering Method with Autoselect Cluster Centers
    Wang, Zhihe
    Li, Yongbiao
    Du, Hui
    Wei, Xiaofen
    MOBILE INFORMATION SYSTEMS, 2022, 2022