Initialization of K-modes clustering using outlier detection techniques

被引:71
|
作者
Jiang, Feng [1 ]
Liu, Guozhu [1 ]
Du, Junwei [1 ]
Sui, Yuefei [2 ]
机构
[1] Qingdao Univ Sci & Technol, Coll Informat Sci & Technol, Qingdao 266061, Peoples R China
[2] Chinese Acad Sci, Inst Comp Technol, Beijing 100190, Peoples R China
基金
中国国家自然科学基金;
关键词
K-modes clustering; Outlier detection; Initial cluster centers; Distance; Partition entropy; KNOWLEDGE GRANULATION; INFORMATION ENTROPY; ROUGH ENTROPY; DISSIMILARITY MEASURE; MEANS ALGORITHM; UNCERTAINTY;
D O I
10.1016/j.ins.2015.11.005
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The K-modes clustering has received much attention, since it works well for categorical data sets. However, the performance of K-modes clustering is especially sensitive to the selection of initial cluster centers. Therefore, choosing the proper initial cluster centers is a key step for K-modes clustering. In this paper, we consider the initialization of K-modes clustering from the view of outlier detection. We present two different initialization algorithms for K-modes clustering, where the first is based on the traditional distance-based outlier detection technique, and the second is based on the partition entropy-based outlier detection technique. By using the above two outlier detection techniques to calculate the degree of outlierness of each object, our algorithms can guarantee that the chosen initial cluster centers are not outliers. Moreover, during the process of initialization, we adopt a new distance metric weighted matching distance metric, to calculate the distance between two objects described by categorical attributes. Experimental results on several UCI data sets demonstrate the effectiveness of our initialization algorithms for K-modes clustering. (C) 2015 Elsevier Inc. All rights reserved.
引用
收藏
页码:167 / 183
页数:17
相关论文
共 50 条
  • [21] Privacy-preserving mechanisms for k-modes clustering
    Huu Hiep Nguyen
    [J]. COMPUTERS & SECURITY, 2018, 78 : 60 - 75
  • [22] An efficient k-modes algorithm for clustering categorical datasets
    Dorman, Karin S.
    Maitra, Ranjan
    [J]. STATISTICAL ANALYSIS AND DATA MINING, 2022, 15 (01) : 83 - 97
  • [23] Approximate Clustering of Time-Series Datasets using k-Modes Partitioning
    Aghabozorgi, Saeed
    Teh Ying Wah
    [J]. JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2015, 31 (01) : 207 - 228
  • [24] A fuzzy k-modes algorithm for clustering categorical data
    Huang, ZX
    Ng, MK
    [J]. IEEE TRANSACTIONS ON FUZZY SYSTEMS, 1999, 7 (04) : 446 - 452
  • [25] A genetic k-modes algorithm for clustering categorical data
    Gan, GJ
    Yang, ZJ
    Wu, JH
    [J]. ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2005, 3584 : 195 - 202
  • [26] On the impact of dissimilarity measure in k-modes clustering algorithm
    Ng, Michael K.
    Li, Mark Junjie
    Huang, Joshua Zhexue
    He, Zengyou
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2007, 29 (03) : 503 - 507
  • [27] Feature-Weighted Fuzzy K-Modes Clustering
    Nataliani, Yessica
    Yang, Miin-Shen
    [J]. 2019 3RD INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS, METAHEURISTICS & SWARM INTELLIGENCE (ISMSI 2019), 2019, : 63 - 68
  • [28] Clustering categorical data: Soft rounding k-modes
    Gavva, Surya Teja
    Karthik, C. S.
    Punna, Sharath
    [J]. INFORMATION AND COMPUTATION, 2024, 296
  • [29] BINARY CODES K-MODES CLUSTERING FOR HSI SEGMENTATION
    Berthier, Michel
    El Asmar, Saadallah
    Frelicot, Carl
    [J]. 2016 IEEE 12TH IMAGE, VIDEO, AND MULTIDIMENSIONAL SIGNAL PROCESSING WORKSHOP (IVMSP), 2016,
  • [30] DP-k-modes: A self-tuning k-modes clustering algorithm
    Xie, Juanying
    Wang, Mingzhao
    Lu, Xiaoxiao
    Liu, Xinglin
    Grant, Philip W.
    [J]. Pattern Recognition Letters, 2022, 158 : 117 - 124