Efficient algorithms based on the k-means and Chaotic League Championship Algorithm for numeric, categorical, and mixed-type data clustering

被引:25
|
作者
Wangchamhan, Tanachapong [1 ]
Chiewchanwattana, Sirapat [1 ]
Sunat, Khamron [1 ]
机构
[1] Khon Kaen Univ, Dept Comp Sci, Fac Sci, Khon Kaen 40002, Thailand
关键词
Data clustering; Search clustering algorithm; Hybrid clustering algurtiliin; League Championship Algorithm (LCA); Chaos optimization algorithms (COA); Mixed-type data; OPTIMIZATION ALGORITHM; GLOBAL OPTIMIZATION; SEARCH;
D O I
10.1016/j.eswa.2017.08.004
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The success rates of the expert or intelligent systems depend on the selection of the correct data clusters. The k-means algorithm is a well-known method in solving data clustering problems. It suffers not only from a high dependency on the algorithm's initial solution but also from the used distance function. A number of algorithms have been proposed to address the centroid initialization problem, but the produced solution does not produce optimum clusters. This paper proposes three algorithms (i) the search algorithm C-LCA that is an improved League Championship Algorithm (LCA), (ii) a search clustering using C-LCA (SC-LCA), and (iii) a hybrid-clustering algorithm called the hybrid of k-means and Chaotic League Championship Algorithm (KSC-LCA) and this algorithm has of two computation stages. The C-LCA employs chaotic adaptation for the retreat and approach parameters, rather than constants, which can enhance the search capability. Furthermore, to overcome the limitation of the original k-means algorithm using the Euclidean distance that cannot handle the categorical attribute type properly, we adopt the Gower distance and the mechanism for handling a discrete value requirement of the categorical value attribute. The proposed algorithms can handle not only the pure numeric data but also the mixed-type data and can find the best centroids containing categorical values. Experiments were conducted on 14 datasets from the UCI repository. The SC-LCA and KSC-LCA competed with 16 established algorithms including the k-means, k-means++, global k-means algorithms, four search clustering algorithms and nine hybrids of k-means algorithm with several state-of-the-art evolutionary algorithms. The experimental results show that the SC-LCA produces the cluster with the highest F-Measure on the pure categorical dataset and the KSC-LCA produces the cluster with the highest F-Measure for the pure numeric and mixed-type tested datasets. Out of 14 datasets, there were 13 centroids produced by the SC-LCA that had better F-Measures than that of the k-means algorithm. On the Tic-Tac-Toe dataset containing only categorical attributes, the SC-LCA can achieve an F-Measure of 66.61 that is 21.74 points over that of the k-means algorithm (44.87). The KSC-LCA produced better centroids than k-means algorithm in all 14 datasets; the maximum F-Measure improvement was 11.59 points. However, in terms of the computational time, the SC-LCA and KSC-LCA took more NFEs than the k-means and its variants but the KSC-LCA ranks first and SC-LCA ranks fourth among the hybrid clustering and the search clustering algorithms that we tested. Therefore, the SC-LCA and KSC-LCA are general and effective clustering algorithms that could be used when an expert or intelligent system requires an accurate high-speed cluster selection. (C) 2017 Elsevier Ltd. All rights reserved.
引用
下载
收藏
页码:146 / 167
页数:22
相关论文
共 50 条
  • [1] A k-means type clustering algorithm for subspace clustering of mixed numeric and categorical datasets
    Ahmad, Amir
    Dey, Lipika
    PATTERN RECOGNITION LETTERS, 2011, 32 (07) : 1062 - 1069
  • [2] A Weight Entropy k-means Algorithm for Clustering Dataset with Mixed Numeric and Categorical Data
    Li, Taoying
    Chen, Yan
    FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 1, PROCEEDINGS, 2008, : 36 - 41
  • [3] A modified K-means algorithm for categorical data clustering
    Sun, Y
    Zhu, QM
    Chen, ZX
    IC-AI'2000: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 1-III, 2000, : 31 - 37
  • [4] A CLUSTERING ALGORITHM FOR MIXED NUMERIC AND CATEGORICAL DATA
    Ohn Mar San
    Van-Nam Huynh
    Yoshiteru Nakamori
    Journal of Systems Science & Complexity, 2003, (04) : 562 - 571
  • [5] A k-mean clustering algorithm for mixed numeric and categorical data
    Ahmad, Amir
    Dey, Lipika
    DATA & KNOWLEDGE ENGINEERING, 2007, 63 (02) : 503 - 527
  • [6] An improved k-prototypes clustering algorithm for mixed numeric and categorical data
    Ji, Jinchao
    Bai, Tian
    Zhou, Chunguang
    Ma, Chao
    Wang, Zhe
    NEUROCOMPUTING, 2013, 120 : 590 - 596
  • [7] A fuzzy k-prototype clustering algorithm for mixed numeric and categorical data
    Ji, Jinchao
    Pang, Wei
    Zhou, Chunguang
    Han, Xiao
    Wang, Zhe
    KNOWLEDGE-BASED SYSTEMS, 2012, 30 : 129 - 135
  • [8] K-Means Extensions for Clustering Categorical Data
    Alwersh, Mohammed
    Kovacs, Laszlo
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (09) : 492 - 507
  • [9] An efficient K-means clustering algorithm for tall data
    Capo, Marco
    Perez, Aritz
    Lozano, Jose A.
    DATA MINING AND KNOWLEDGE DISCOVERY, 2020, 34 (03) : 776 - 811
  • [10] An efficient K-means clustering algorithm for tall data
    Marco Capó
    Aritz Pérez
    Jose A. Lozano
    Data Mining and Knowledge Discovery, 2020, 34 : 776 - 811