An efficient k-modes algorithm for clustering categorical datasets

被引:0
|
作者
Dorman, Karin S. [1 ,2 ]
Maitra, Ranjan [2 ]
机构
[1] Iowa State Univ, Dept Genet Dev & Cell Biol, Ames, IA USA
[2] Iowa State Univ, Dept Stat, Ames, IA 50011 USA
基金
美国食品与农业研究所; 美国农业部;
关键词
categorical data clustering; k-modes; OT algorithm; OTQT algorithm; LATENT STRUCTURE-ANALYSIS; DISSIMILARITY MEASURE; SIMILARITY; DISTANCE; SET; ATTRIBUTE;
D O I
10.1002/sam.11546
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Mining clusters from data is an important endeavor in many applications. The k-means method is a popular, efficient, and distribution-free approach for clustering numerical-valued data, but does not apply for categorical-valued observations. The k-modes method addresses this lacuna by replacing the Euclidean with the Hamming distance and the means with the modes in the k-means objective function. We provide a novel, computationally efficient implementation of k-modes, called Optimal Transfer Quick Transfer (OTQT). We prove that OTQT finds updates to improve the objective function that are undetectable to existing k-modes algorithms. Although slightly slower per iteration due to algorithmic complexity, OTQT is always more accurate and almost always faster (and only barely slower on some datasets) to the final optimum. Thus, we recommend OTQT as the preferred, default algorithm for k-modes optimization.
引用
收藏
页码:83 / 97
页数:15
相关论文
共 50 条
  • [1] A fuzzy k-modes algorithm for clustering categorical data
    Huang, ZX
    Ng, MK
    [J]. IEEE TRANSACTIONS ON FUZZY SYSTEMS, 1999, 7 (04) : 446 - 452
  • [2] A Global K-modes Algorithm for Clustering Categorical Data
    Bai Tian
    Kulikowski, C. A.
    Gong Leiguang
    Yang Bin
    Huang Lan
    Zhou Chunguang
    [J]. CHINESE JOURNAL OF ELECTRONICS, 2012, 21 (03) : 460 - 465
  • [3] A genetic k-modes algorithm for clustering categorical data
    Gan, GJ
    Yang, ZJ
    Wu, JH
    [J]. ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2005, 3584 : 195 - 202
  • [4] A weighting k-modes algorithm for subspace clustering of categorical data
    Cao, Fuyuan
    Liang, Jiye
    Li, Deyu
    Zhao, Xingwang
    [J]. NEUROCOMPUTING, 2013, 108 : 23 - 30
  • [5] A genetic fuzzy k-Modes algorithm for clustering categorical data
    Gan, G.
    Wu, J.
    Yang, Z.
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (02) : 1615 - 1620
  • [6] Initialization of K-Modes Clustering for Categorical Data
    Li Tao-ying
    Chen Yan
    Jin Zhi-hong
    Li Ye
    [J]. 2013 INTERNATIONAL CONFERENCE ON MANAGEMENT SCIENCE AND ENGINEERING (ICMSE), 2013, : 107 - 112
  • [7] Clustering categorical data: Soft rounding k-modes
    Gavva, Surya Teja
    Karthik, C. S.
    Punna, Sharath
    [J]. INFORMATION AND COMPUTATION, 2024, 296
  • [8] A MD fuzzy k-modes Algorithm for Clustering Categorical Matrix-Object Data
    Li, Shunyong
    Zhang, Miaomiao
    Cao, Fuyuan
    [J]. Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2019, 56 (06): : 1325 - 1337
  • [9] DP- k-modes: A self-tuning k-modes clustering algorithm
    Xie, Juanying
    Wang, Mingzhao
    Lu, Xiaoxiao
    Liu, Xinglin
    Grant, Philip W.
    [J]. PATTERN RECOGNITION LETTERS, 2022, 158 : 117 - 124
  • [10] Clustering of Categorical Data Using Intuitionistic Fuzzy k-modes
    Mehta, Darshan
    Tripathy, B. K.
    [J]. PROCEEDINGS OF SIXTH INTERNATIONAL CONFERENCE ON SOFT COMPUTING FOR PROBLEM SOLVING (SOCPROS 2016), VOL 1, 2017, 546 : 254 - 263