A new initialization method for categorical data clustering

被引:119
|
作者
Cao, Fuyuan [1 ,2 ]
Liang, Jiye [1 ,2 ]
Bai, Liang [1 ]
机构
[1] Shanxi Univ, Sch Comp & Informat Technol, Taiyuan 030006, Shanxi, Peoples R China
[2] Minist Educ Res, Key Lab Computat Intelligence & Chinese Informat, Taiyuan 030006, Peoples R China
基金
中国国家自然科学基金;
关键词
Density; Distance; Initialization method; Initial cluster center; k-modes algorithm; K-MODES ALGORITHM;
D O I
10.1016/j.eswa.2009.01.060
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In clustering algorithms, choosing a subset of representative examples is very important in data set. Such "exemplars" can be found by randomly choosing an initial subset of data objects and then iteratively refining it, but this works well only if that initial choice is close to a good solution. In this paper, based on the frequency of attribute values, the average density of an object is defined. Furthermore, a novel initialization method for categorical data is proposed, in which the distance between objects and the density of the object is considered. We also apply the proposed initialization method to k-modes algorithm and fuzzy k-modes algorithm. Experimental results illustrate that the proposed initialization method is superior to random initialization method and can be applied to large data sets for its linear time complexity with respect to the number of data objects. (C) 2009 Elsevier Ltd. All rights reserved.
引用
收藏
页码:10223 / 10228
页数:6
相关论文
共 50 条
  • [1] A new initialization method for clustering categorical data
    Wu, Shu
    Jiang, Qingshan
    Huang, Joshua Zhexue
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2007, 4426 : 972 - +
  • [2] A cluster centers initialization method for clustering categorical data
    Bai, Liang
    Liang, Jiye
    Dang, Chuangyin
    Cao, Fuyuan
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (09) : 8022 - 8029
  • [3] An Initialization Method for Clustering Mixed Numeric and Categorical Data Based on the Density and Distance
    Ji, Jinchao
    Pang, Wei
    Zheng, Yanlin
    Wang, Zhe
    Ma, Zhiqiang
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2015, 29 (07)
  • [4] Initialization of K-Modes Clustering for Categorical Data
    Li Tao-ying
    Chen Yan
    Jin Zhi-hong
    Li Ye
    [J]. 2013 INTERNATIONAL CONFERENCE ON MANAGEMENT SCIENCE AND ENGINEERING (ICMSE), 2013, : 107 - 112
  • [5] A Support Based Initialization Algorithm for Categorical Data Clustering
    Kumar, Ajay
    Kumar, Shishir
    [J]. JOURNAL OF INFORMATION TECHNOLOGY RESEARCH, 2018, 11 (02) : 53 - 67
  • [6] An initialization method to simultaneously find initial cluster centers and the number of clusters for clustering categorical data
    Bai, Liang
    Liang, Jiye
    Dang, Chuangyin
    [J]. KNOWLEDGE-BASED SYSTEMS, 2011, 24 (06) : 785 - 795
  • [7] A data labeling method for clustering categorical data
    Cao, Fuyuan
    Liang, Jiye
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (03) : 2381 - 2385
  • [8] Data Reduction Method for Categorical Data Clustering
    Rendon, Erendira
    Salvador Sanchez, J.
    Garcia, Rene A.
    Abundez, Itzel
    Gutierrez, Citlalih
    Gasca, Eduardo
    [J]. ADVANCES IN ARTIFICIAL INTELLIGENCE - IBERAMIA 2008, PROCEEDINGS, 2008, 5290 : 143 - +
  • [9] A Clustering Method for Categorical Ordinal Data
    Giordan, Marco
    Diana, Giancarlo
    [J]. COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2011, 40 (07) : 1315 - 1334
  • [10] k-PbC: an improved cluster center initialization for categorical data clustering
    Duy-Tai Dinh
    Van-Nam Huynh
    [J]. Applied Intelligence, 2020, 50 : 2610 - 2632