A k-means type clustering algorithm for subspace clustering of mixed numeric and categorical datasets

被引:43
|
作者
Ahmad, Amir [1 ]
Dey, Lipika [2 ]
机构
[1] King Abdulaziz Univ, Fac Comp & Informat Technol, Rabigh, Saudi Arabia
[2] Tata Consultancy Serv, Innovat Labs, New Delhi, India
关键词
Clustering; Subspace clustering; Mixed data; Categorical data;
D O I
10.1016/j.patrec.2011.02.017
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Almost all subspace clustering algorithms proposed so far are designed for numeric datasets. In this paper, we present a k-means type clustering algorithm that finds clusters in data subspaces in mixed numeric and categorical datasets. In this method, we compute attributes contribution to different clusters. We propose a new cost function for a k-means type algorithm. One of the advantages of this algorithm is its complexity which is linear with respect to the number of the data points. This algorithm is also useful in describing the cluster formation in terms of attributes contribution to different clusters. The algorithm is tested on various synthetic and real datasets to show its effectiveness. The clustering results are explained by using attributes weights in the clusters. The clustering results are also compared with published results. (C) 2011 Elsevier B.V. All rights reserved.
引用
收藏
页码:1062 / 1069
页数:8
相关论文
共 50 条
  • [1] A Weight Entropy k-means Algorithm for Clustering Dataset with Mixed Numeric and Categorical Data
    Li, Taoying
    Chen, Yan
    [J]. FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 1, PROCEEDINGS, 2008, : 36 - 41
  • [2] An Affinity Propagation Clustering Algorithm for Mixed Numeric and Categorical Datasets
    Zhang, Kang
    Gu, Xingsheng
    [J]. MATHEMATICAL PROBLEMS IN ENGINEERING, 2014, 2014
  • [3] Efficient algorithms based on the k-means and Chaotic League Championship Algorithm for numeric, categorical, and mixed-type data clustering
    Wangchamhan, Tanachapong
    Chiewchanwattana, Sirapat
    Sunat, Khamron
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2017, 90 : 146 - 167
  • [4] Subspace K-means clustering
    Marieke E. Timmerman
    Eva Ceulemans
    Kim De Roover
    Karla Van Leeuwen
    [J]. Behavior Research Methods, 2013, 45 : 1011 - 1023
  • [5] Subspace K-means clustering
    Timmerman, Marieke E.
    Ceulemans, Eva
    De Roover, Kim
    Van Leeuwen, Karla
    [J]. BEHAVIOR RESEARCH METHODS, 2013, 45 (04) : 1011 - 1023
  • [6] A modified K-means algorithm for categorical data clustering
    Sun, Y
    Zhu, QM
    Chen, ZX
    [J]. IC-AI'2000: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 1-III, 2000, : 31 - 37
  • [7] A Heuristically Weighting K-Means Algorithm for Subspace Clustering
    Li, Boyang
    Jiang, Qingshan
    Chen, Lifei
    [J]. 2008 2ND INTERNATIONAL CONFERENCE ON ANTI-COUNTERFEITING, SECURITY AND IDENTIFICATION, 2008, : 268 - +
  • [8] A CLUSTERING ALGORITHM FOR MIXED NUMERIC AND CATEGORICAL DATA
    Ohn Mar San
    Van-Nam Huynh
    Yoshiteru Nakamori
    [J]. Journal of Systems Science & Complexity, 2003, (04) : 562 - 571
  • [9] Supplier categorization with K-means type subspace clustering
    Zhang, XJ
    Huang, JZ
    Qian, DP
    Xu, J
    Jing, LP
    [J]. FRONTIERS OF WWW RESEARCH AND DEVELOPMENT - APWEB 2006, PROCEEDINGS, 2006, 3841 : 226 - 237
  • [10] K-Harmonic means type clustering algorithm for mixed datasets
    Ahmad, Amir
    Hashmi, Sarosh
    [J]. APPLIED SOFT COMPUTING, 2016, 48 : 39 - 49