An efficient k-modes algorithm for clustering categorical datasets

被引:0
|
作者
Dorman, Karin S. [1 ,2 ]
Maitra, Ranjan [2 ]
机构
[1] Iowa State Univ, Dept Genet Dev & Cell Biol, Ames, IA USA
[2] Iowa State Univ, Dept Stat, Ames, IA 50011 USA
基金
美国食品与农业研究所; 美国农业部;
关键词
categorical data clustering; k-modes; OT algorithm; OTQT algorithm; LATENT STRUCTURE-ANALYSIS; DISSIMILARITY MEASURE; SIMILARITY; DISTANCE; SET; ATTRIBUTE;
D O I
10.1002/sam.11546
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Mining clusters from data is an important endeavor in many applications. The k-means method is a popular, efficient, and distribution-free approach for clustering numerical-valued data, but does not apply for categorical-valued observations. The k-modes method addresses this lacuna by replacing the Euclidean with the Hamming distance and the means with the modes in the k-means objective function. We provide a novel, computationally efficient implementation of k-modes, called Optimal Transfer Quick Transfer (OTQT). We prove that OTQT finds updates to improve the objective function that are undetectable to existing k-modes algorithms. Although slightly slower per iteration due to algorithmic complexity, OTQT is always more accurate and almost always faster (and only barely slower on some datasets) to the final optimum. Thus, we recommend OTQT as the preferred, default algorithm for k-modes optimization.
引用
收藏
页码:83 / 97
页数:15
相关论文
共 50 条
  • [31] Approximation algorithms for K-modes clustering
    He, Zengyou
    Deng, Shengchun
    Xu, Xiaofei
    [J]. COMPUTATIONAL INTELLIGENCE, PT 2, PROCEEDINGS, 2006, 4114 : 296 - 302
  • [32] Research on Seafood Traceable Data Based on k-Modes Clustering Algorithm
    Ge, Li
    Li, Jiajun
    Chen, Jun
    [J]. JOURNAL OF COASTAL RESEARCH, 2020, : 73 - 77
  • [33] Software cost estimation based on modified K-Modes clustering Algorithm
    Bishnu, Partha Sarathi
    Bhattacherjee, Vandana
    [J]. NATURAL COMPUTING, 2016, 15 (03) : 415 - 422
  • [34] A Global-Relationship Dissimilarity Measure for the k-Modes Clustering Algorithm
    Zhou, Hongfang
    Zhang, Yihui
    Liu, Yibin
    [J]. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2017, 2017
  • [35] Application of metaheuristic based fuzzy K-modes algorithm to supplier clustering
    Kuo, R. J.
    Potti, Yuliana
    Zulvia, Ferani E.
    [J]. COMPUTERS & INDUSTRIAL ENGINEERING, 2018, 120 : 298 - 307
  • [36] Software cost estimation based on modified K-Modes clustering Algorithm
    Partha Sarathi Bishnu
    Vandana Bhattacherjee
    [J]. Natural Computing, 2016, 15 : 415 - 422
  • [37] Attribute weights-based clustering centres algorithm for initialising K-modes clustering
    Liwen Peng
    Yongguo Liu
    [J]. Cluster Computing, 2019, 22 : 6171 - 6179
  • [38] Attribute weights-based clustering centres algorithm for initialising K-modes clustering
    Peng, Liwen
    Liu, Yongguo
    [J]. CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2019, 22 (Suppl 3): : S6171 - S6179
  • [39] Multivariate fuzzy k-modes algorithm
    Maciel, Diego B. M.
    Amaral, Getulio J. A.
    de Souza, Renata M. C. R.
    Pimentel, Bruno A.
    [J]. PATTERN ANALYSIS AND APPLICATIONS, 2017, 20 (01) : 59 - 71
  • [40] A load clustering algorithm based on discrete wavelet transform and fuzzy K-modes
    Zhang, Jianglin
    Zhang, Yachao
    Hong, Juhua
    Gao, Hongjun
    Liu, Junyong
    [J]. Dianli Zidonghua Shebei/Electric Power Automation Equipment, 2019, 39 (02): : 100 - 106