Initialization of K-modes clustering using outlier detection techniques

被引:71
|
作者
Jiang, Feng [1 ]
Liu, Guozhu [1 ]
Du, Junwei [1 ]
Sui, Yuefei [2 ]
机构
[1] Qingdao Univ Sci & Technol, Coll Informat Sci & Technol, Qingdao 266061, Peoples R China
[2] Chinese Acad Sci, Inst Comp Technol, Beijing 100190, Peoples R China
基金
中国国家自然科学基金;
关键词
K-modes clustering; Outlier detection; Initial cluster centers; Distance; Partition entropy; KNOWLEDGE GRANULATION; INFORMATION ENTROPY; ROUGH ENTROPY; DISSIMILARITY MEASURE; MEANS ALGORITHM; UNCERTAINTY;
D O I
10.1016/j.ins.2015.11.005
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The K-modes clustering has received much attention, since it works well for categorical data sets. However, the performance of K-modes clustering is especially sensitive to the selection of initial cluster centers. Therefore, choosing the proper initial cluster centers is a key step for K-modes clustering. In this paper, we consider the initialization of K-modes clustering from the view of outlier detection. We present two different initialization algorithms for K-modes clustering, where the first is based on the traditional distance-based outlier detection technique, and the second is based on the partition entropy-based outlier detection technique. By using the above two outlier detection techniques to calculate the degree of outlierness of each object, our algorithms can guarantee that the chosen initial cluster centers are not outliers. Moreover, during the process of initialization, we adopt a new distance metric weighted matching distance metric, to calculate the distance between two objects described by categorical attributes. Experimental results on several UCI data sets demonstrate the effectiveness of our initialization algorithms for K-modes clustering. (C) 2015 Elsevier Inc. All rights reserved.
引用
收藏
页码:167 / 183
页数:17
相关论文
共 50 条
  • [21] A fuzzy k-modes algorithm for clustering categorical data
    Huang, ZX
    Ng, MK
    [J]. IEEE TRANSACTIONS ON FUZZY SYSTEMS, 1999, 7 (04) : 446 - 452
  • [22] Privacy-preserving mechanisms for k-modes clustering
    Huu Hiep Nguyen
    [J]. COMPUTERS & SECURITY, 2018, 78 : 60 - 75
  • [23] An efficient k-modes algorithm for clustering categorical datasets
    Dorman, Karin S.
    Maitra, Ranjan
    [J]. STATISTICAL ANALYSIS AND DATA MINING, 2022, 15 (01) : 83 - 97
  • [24] A Global K-modes Algorithm for Clustering Categorical Data
    Bai Tian
    Kulikowski, C. A.
    Gong Leiguang
    Yang Bin
    Huang Lan
    Zhou Chunguang
    [J]. CHINESE JOURNAL OF ELECTRONICS, 2012, 21 (03) : 460 - 465
  • [25] A genetic k-modes algorithm for clustering categorical data
    Gan, GJ
    Yang, ZJ
    Wu, JH
    [J]. ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2005, 3584 : 195 - 202
  • [26] Feature-Weighted Fuzzy K-Modes Clustering
    Nataliani, Yessica
    Yang, Miin-Shen
    [J]. 2019 3RD INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS, METAHEURISTICS & SWARM INTELLIGENCE (ISMSI 2019), 2019, : 63 - 68
  • [27] Clustering categorical data: Soft rounding k-modes
    Gavva, Surya Teja
    Karthik, C. S.
    Punna, Sharath
    [J]. INFORMATION AND COMPUTATION, 2024, 296
  • [28] On the impact of dissimilarity measure in k-modes clustering algorithm
    Ng, Michael K.
    Li, Mark Junjie
    Huang, Joshua Zhexue
    He, Zengyou
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2007, 29 (03) : 503 - 507
  • [29] A Dynamic Web Recommender System Using Hard and Fuzzy K-Modes Clustering
    Christodoulou, Panayiotis
    Lestas, Marios
    Andreou, Andreas S.
    [J]. ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS, AIAI 2013, 2013, 412 : 40 - 51
  • [30] BINARY CODES K-MODES CLUSTERING FOR HSI SEGMENTATION
    Berthier, Michel
    El Asmar, Saadallah
    Frelicot, Carl
    [J]. 2016 IEEE 12TH IMAGE, VIDEO, AND MULTIDIMENSIONAL SIGNAL PROCESSING WORKSHOP (IVMSP), 2016,