Initialization of K-modes clustering using outlier detection techniques

被引:71
|
作者
Jiang, Feng [1 ]
Liu, Guozhu [1 ]
Du, Junwei [1 ]
Sui, Yuefei [2 ]
机构
[1] Qingdao Univ Sci & Technol, Coll Informat Sci & Technol, Qingdao 266061, Peoples R China
[2] Chinese Acad Sci, Inst Comp Technol, Beijing 100190, Peoples R China
基金
中国国家自然科学基金;
关键词
K-modes clustering; Outlier detection; Initial cluster centers; Distance; Partition entropy; KNOWLEDGE GRANULATION; INFORMATION ENTROPY; ROUGH ENTROPY; DISSIMILARITY MEASURE; MEANS ALGORITHM; UNCERTAINTY;
D O I
10.1016/j.ins.2015.11.005
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The K-modes clustering has received much attention, since it works well for categorical data sets. However, the performance of K-modes clustering is especially sensitive to the selection of initial cluster centers. Therefore, choosing the proper initial cluster centers is a key step for K-modes clustering. In this paper, we consider the initialization of K-modes clustering from the view of outlier detection. We present two different initialization algorithms for K-modes clustering, where the first is based on the traditional distance-based outlier detection technique, and the second is based on the partition entropy-based outlier detection technique. By using the above two outlier detection techniques to calculate the degree of outlierness of each object, our algorithms can guarantee that the chosen initial cluster centers are not outliers. Moreover, during the process of initialization, we adopt a new distance metric weighted matching distance metric, to calculate the distance between two objects described by categorical attributes. Experimental results on several UCI data sets demonstrate the effectiveness of our initialization algorithms for K-modes clustering. (C) 2015 Elsevier Inc. All rights reserved.
引用
收藏
页码:167 / 183
页数:17
相关论文
共 50 条
  • [1] Initialization of K-Modes Clustering for Categorical Data
    Li Tao-ying
    Chen Yan
    Jin Zhi-hong
    Li Ye
    [J]. 2013 INTERNATIONAL CONFERENCE ON MANAGEMENT SCIENCE AND ENGINEERING (ICMSE), 2013, : 107 - 112
  • [2] Cluster center initialization algorithm for K-modes clustering
    Khan, Shehroz S.
    Ahmad, Amir
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2013, 40 (18) : 7444 - 7456
  • [3] K-modes clustering
    Chaturvedi, A
    Green, PE
    Carroll, JD
    [J]. JOURNAL OF CLASSIFICATION, 2001, 18 (01) : 35 - 55
  • [4] K-modes Clustering
    Anil Chaturvedi
    Paul E. Green
    J. Douglas Caroll
    [J]. Journal of Classification, 2001, 18 : 35 - 55
  • [5] A Modified Initialization Method to Find an Initial Center for Fuzzy K-Modes Clustering
    Saranya, S.
    Jayanthi, P.
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENT TECHNIQUES IN CONTROL, OPTIMIZATION AND SIGNAL PROCESSING (INCOS), 2017,
  • [6] A note on K-modes clustering
    Huang, ZX
    Ng, MK
    [J]. JOURNAL OF CLASSIFICATION, 2003, 20 (02) : 257 - 261
  • [7] A Note on K-modes Clustering
    Zhexue Huang
    Michael K. Ng
    [J]. Journal of Classification, 2003, 20 : 257 - 261
  • [8] DP- k-modes: A self-tuning k-modes clustering algorithm
    Xie, Juanying
    Wang, Mingzhao
    Lu, Xiaoxiao
    Liu, Xinglin
    Grant, Philip W.
    [J]. PATTERN RECOGNITION LETTERS, 2022, 158 : 117 - 124
  • [9] Approximation algorithms for K-modes clustering
    He, Zengyou
    Deng, Shengchun
    Xu, Xiaofei
    [J]. COMPUTATIONAL INTELLIGENCE, PT 2, PROCEEDINGS, 2006, 4114 : 296 - 302
  • [10] K-modes and Entropy Cluster Centers Initialization Methods
    Ali, Doaa S.
    Ghoneim, Ayman
    Saleh, Mohamed
    [J]. PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON OPERATIONS RESEARCH AND ENTERPRISE SYSTEMS (ICORES), 2017, : 447 - 454