A cluster centers initialization method for clustering categorical data

被引:62
|
作者
Bai, Liang [1 ,2 ]
Liang, Jiye [1 ]
Dang, Chuangyin [2 ]
Cao, Fuyuan [1 ]
机构
[1] Shanxi Univ, Sch Comp & Informat Technol, Minist Educ, Key Lab Computat Intelligence & Chinese Informat, Taiyuan 030006, Shanxi, Peoples R China
[2] City Univ Hong Kong, Dept Mfg Engn & Engn Management, Hong Kong, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
The k-modes algorithm; Initialization method; Initial cluster centers; Density; Distance; GENETIC ALGORITHM;
D O I
10.1016/j.eswa.2012.01.131
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The leading partitional clustering technique, k-modes, is one of the most computationally efficient clustering methods for categorical data. However, the performance of the k-modes clustering algorithm which converges to numerous local minima strongly depends on initial cluster centers. Currently, most methods of initialization cluster centers are mainly for numerical data. Due to lack of geometry for the categorical data, these methods used in cluster centers initialization for numerical data are not applicable to categorical data. This paper proposes a novel initialization method for categorical data which is implemented to the k-modes algorithm. The method integrates the distance and the density together to select initial cluster centers and overcomes shortcomings of the existing initialization methods for categorical data. Experimental results illustrate the proposed initialization method is effective and can be applied to large data sets for its linear time complexity with respect to the number of data objects. (C) 2012 Elsevier Ltd. All rights reserved.
引用
收藏
页码:8022 / 8029
页数:8
相关论文
共 50 条
  • [1] An initialization method to simultaneously find initial cluster centers and the number of clusters for clustering categorical data
    Bai, Liang
    Liang, Jiye
    Dang, Chuangyin
    KNOWLEDGE-BASED SYSTEMS, 2011, 24 (06) : 785 - 795
  • [2] A new initialization method for clustering categorical data
    Wu, Shu
    Jiang, Qingshan
    Huang, Joshua Zhexue
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2007, 4426 : 972 - +
  • [3] A new initialization method for categorical data clustering
    Cao, Fuyuan
    Liang, Jiye
    Bai, Liang
    EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (07) : 10223 - 10228
  • [4] k-PbC: an improved cluster center initialization for categorical data clustering
    Duy-Tai Dinh
    Van-Nam Huynh
    Applied Intelligence, 2020, 50 : 2610 - 2632
  • [5] k-PbC: an improved cluster center initialization for categorical data clustering
    Duy-Tai Dinh
    Van-Nam Huynh
    APPLIED INTELLIGENCE, 2020, 50 (08) : 2610 - 2632
  • [6] An Initialization Method for Clustering Mixed Numeric and Categorical Data Based on the Density and Distance
    Ji, Jinchao
    Pang, Wei
    Zheng, Yanlin
    Wang, Zhe
    Ma, Zhiqiang
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2015, 29 (07)
  • [7] A NEW INITIALIZATION METHOD OF CLUSTER CENTERS
    Pei Jihong Fan Jiulun Xie Weixin(School of Electronic Engineering
    Journal of Electronics(China), 1999, (04) : 320 - 326
  • [8] Initialization of K-Modes Clustering for Categorical Data
    Li Tao-ying
    Chen Yan
    Jin Zhi-hong
    Li Ye
    2013 INTERNATIONAL CONFERENCE ON MANAGEMENT SCIENCE AND ENGINEERING (ICMSE), 2013, : 107 - 112
  • [9] A Support Based Initialization Algorithm for Categorical Data Clustering
    Kumar, Ajay
    Kumar, Shishir
    JOURNAL OF INFORMATION TECHNOLOGY RESEARCH, 2018, 11 (02) : 53 - 67
  • [10] Clustering Categorical Data:A Cluster Ensemble Approach
    何增友
    High Technology Letters, 2003, (04) : 8 - 12