A cluster centers initialization method for clustering categorical data

被引:62
|
作者
Bai, Liang [1 ,2 ]
Liang, Jiye [1 ]
Dang, Chuangyin [2 ]
Cao, Fuyuan [1 ]
机构
[1] Shanxi Univ, Sch Comp & Informat Technol, Minist Educ, Key Lab Computat Intelligence & Chinese Informat, Taiyuan 030006, Shanxi, Peoples R China
[2] City Univ Hong Kong, Dept Mfg Engn & Engn Management, Hong Kong, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
The k-modes algorithm; Initialization method; Initial cluster centers; Density; Distance; GENETIC ALGORITHM;
D O I
10.1016/j.eswa.2012.01.131
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The leading partitional clustering technique, k-modes, is one of the most computationally efficient clustering methods for categorical data. However, the performance of the k-modes clustering algorithm which converges to numerous local minima strongly depends on initial cluster centers. Currently, most methods of initialization cluster centers are mainly for numerical data. Due to lack of geometry for the categorical data, these methods used in cluster centers initialization for numerical data are not applicable to categorical data. This paper proposes a novel initialization method for categorical data which is implemented to the k-modes algorithm. The method integrates the distance and the density together to select initial cluster centers and overcomes shortcomings of the existing initialization methods for categorical data. Experimental results illustrate the proposed initialization method is effective and can be applied to large data sets for its linear time complexity with respect to the number of data objects. (C) 2012 Elsevier Ltd. All rights reserved.
引用
收藏
页码:8022 / 8029
页数:8
相关论文
共 50 条
  • [41] Mining categorical sequences from data using a hybrid clustering method
    De Angelis, Luca
    Dias, Jose G.
    EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2014, 234 (03) : 720 - 730
  • [42] Incremental Clustering for Categorical Data Using Clustering Ensemble
    Li Taoying
    Chne Yan
    Qu Lili
    Mu Xiangwei
    PROCEEDINGS OF THE 29TH CHINESE CONTROL CONFERENCE, 2010, : 2519 - 2524
  • [43] A COMBINED METHOD FOR THE ANALYSIS OF CATEGORICAL-DATA FROM CLUSTER SAMPLING
    MARTIN, PG
    ALLGEMEINE FORST UND JAGDZEITUNG, 1995, 166 (07): : 129 - 137
  • [44] A method to estimate intra-cluster correlation for clustered categorical data
    Chakraborty H.
    Solomon N.
    Anstrom K.J.
    Communications in Statistics - Theory and Methods, 2023, 52 (02) : 429 - 444
  • [45] K-modes and Entropy Cluster Centers Initialization Methods
    Ali, Doaa S.
    Ghoneim, Ayman
    Saleh, Mohamed
    PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON OPERATIONS RESEARCH AND ENTERPRISE SYSTEMS (ICORES), 2017, : 447 - 454
  • [46] Ordering of categorical data in hierarchical clustering
    Kazimianec, Michail
    DATABASES AND INFORMATION SYSTEMS, 2008, : 401 - 404
  • [47] HABOS clustering algorithm for categorical data
    Wu, Sen (wusen@manage.ustb.edu.cn), 2016, Science Press (38):
  • [48] Formulations of fuzzy clustering for categorical data
    Umayahara, Kazutaka
    Miyamoto, Sadaaki
    Nakamori, Yoshiteru
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2005, 1 (01): : 83 - 94
  • [49] Space Structure and Clustering of Categorical Data
    Qian, Yuhua
    Li, Feijiang
    Liang, Jiye
    Liu, Bing
    Dang, Chuangyin
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2016, 27 (10) : 2047 - 2059
  • [50] Clustering Categorical Data Based on Representatives
    Aranganayagi, S.
    Thangavel, K.
    THIRD 2008 INTERNATIONAL CONFERENCE ON CONVERGENCE AND HYBRID INFORMATION TECHNOLOGY, VOL 1, PROCEEDINGS, 2008, : 599 - +