Efficient algorithms based on the k-means and Chaotic League Championship Algorithm for numeric, categorical, and mixed-type data clustering

被引:25
|
作者
Wangchamhan, Tanachapong [1 ]
Chiewchanwattana, Sirapat [1 ]
Sunat, Khamron [1 ]
机构
[1] Khon Kaen Univ, Dept Comp Sci, Fac Sci, Khon Kaen 40002, Thailand
关键词
Data clustering; Search clustering algorithm; Hybrid clustering algurtiliin; League Championship Algorithm (LCA); Chaos optimization algorithms (COA); Mixed-type data; OPTIMIZATION ALGORITHM; GLOBAL OPTIMIZATION; SEARCH;
D O I
10.1016/j.eswa.2017.08.004
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The success rates of the expert or intelligent systems depend on the selection of the correct data clusters. The k-means algorithm is a well-known method in solving data clustering problems. It suffers not only from a high dependency on the algorithm's initial solution but also from the used distance function. A number of algorithms have been proposed to address the centroid initialization problem, but the produced solution does not produce optimum clusters. This paper proposes three algorithms (i) the search algorithm C-LCA that is an improved League Championship Algorithm (LCA), (ii) a search clustering using C-LCA (SC-LCA), and (iii) a hybrid-clustering algorithm called the hybrid of k-means and Chaotic League Championship Algorithm (KSC-LCA) and this algorithm has of two computation stages. The C-LCA employs chaotic adaptation for the retreat and approach parameters, rather than constants, which can enhance the search capability. Furthermore, to overcome the limitation of the original k-means algorithm using the Euclidean distance that cannot handle the categorical attribute type properly, we adopt the Gower distance and the mechanism for handling a discrete value requirement of the categorical value attribute. The proposed algorithms can handle not only the pure numeric data but also the mixed-type data and can find the best centroids containing categorical values. Experiments were conducted on 14 datasets from the UCI repository. The SC-LCA and KSC-LCA competed with 16 established algorithms including the k-means, k-means++, global k-means algorithms, four search clustering algorithms and nine hybrids of k-means algorithm with several state-of-the-art evolutionary algorithms. The experimental results show that the SC-LCA produces the cluster with the highest F-Measure on the pure categorical dataset and the KSC-LCA produces the cluster with the highest F-Measure for the pure numeric and mixed-type tested datasets. Out of 14 datasets, there were 13 centroids produced by the SC-LCA that had better F-Measures than that of the k-means algorithm. On the Tic-Tac-Toe dataset containing only categorical attributes, the SC-LCA can achieve an F-Measure of 66.61 that is 21.74 points over that of the k-means algorithm (44.87). The KSC-LCA produced better centroids than k-means algorithm in all 14 datasets; the maximum F-Measure improvement was 11.59 points. However, in terms of the computational time, the SC-LCA and KSC-LCA took more NFEs than the k-means and its variants but the KSC-LCA ranks first and SC-LCA ranks fourth among the hybrid clustering and the search clustering algorithms that we tested. Therefore, the SC-LCA and KSC-LCA are general and effective clustering algorithms that could be used when an expert or intelligent system requires an accurate high-speed cluster selection. (C) 2017 Elsevier Ltd. All rights reserved.
引用
收藏
页码:146 / 167
页数:22
相关论文
共 50 条
  • [21] An efficient hybrid algorithm based on modified imperialist competitive algorithm and K-means for data clustering
    Niknam, Taher
    Fard, Elahe Taherian
    Pourjafarian, Narges
    Rousta, Alireza
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2011, 24 (02) : 306 - 317
  • [22] Efficient enhanced k-means clustering algorithm
    Fahim A.M.
    Salem A.M.
    Torkey F.A.
    Ramadan M.A.
    Journal of Zhejiang University-SCIENCE A, 2006, 7 (10): : 1626 - 1633
  • [23] An efficient enhanced k-means clustering algorithm
    FAHIM A.M
    SALEM A.M
    TORKEY F.A
    RAMADAN M.A
    Journal of Zhejiang University-Science A(Applied Physics & Engineering), 2006, (10) : 1626 - 1633
  • [24] An Efficient Global K-means Clustering Algorithm
    Xie, Juanying
    Jiang, Shuai
    Xie, Weixin
    Gao, Xinbo
    JOURNAL OF COMPUTERS, 2011, 6 (02) : 271 - 279
  • [25] A more efficient algorithm for K-means clustering
    Wang, Shouqiang
    Zhu, Daming
    Journal of Computational Information Systems, 2007, 3 (05): : 1951 - 1956
  • [26] Efficient clustering algorithm based on local optimality of K-means
    National Laboratory on Machine Perception, Department of Intelligence Science, Peking University, Beijing 100871, China
    不详
    不详
    Ruan Jian Xue Bao, 2008, 7 (1683-1692):
  • [27] Far Efficient K-Means Clustering Algorithm
    Mishra, Bikram Keshari
    Nayak, Nihar Ranjan
    Rath, Amiya
    Swain, Sagarika
    PROCEEDINGS OF THE 2012 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI'12), 2012, : 106 - 110
  • [28] An Improved Mixed-type Data based Kernel Clustering Algorithm
    Ren, Min
    Liu, Peiyu
    Wang, Zhihao
    Pan, Xiao
    2016 12TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (ICNC-FSKD), 2016, : 1205 - 1209
  • [29] An Efficient K-means Clustering Algorithm on MapReduce
    Li, Qiuhong
    Wang, Peng
    Wang, Wei
    Hu, Hao
    Li, Zhongsheng
    Li, Junxian
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2014, PT I, 2014, 8421 : 357 - 371
  • [30] A Multi-View Clustering Algorithm for Mixed Numeric and Categorical Data
    Ji, Jinchao
    Li, Ruonan
    Pang, Wei
    He, Fei
    Feng, Guozhong
    Zhao, Xiaowei
    IEEE ACCESS, 2021, 9 : 24913 - 24924