Efficient algorithms based on the k-means and Chaotic League Championship Algorithm for numeric, categorical, and mixed-type data clustering

被引：25

作者：

Wangchamhan, Tanachapong ^{[1
]}

Chiewchanwattana, Sirapat ^{[1
]}

Sunat, Khamron ^{[1
]}

机构：

[1] Khon Kaen Univ, Dept Comp Sci, Fac Sci, Khon Kaen 40002, Thailand

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2017年 / 90卷

关键词：

Data clustering; Search clustering algorithm; Hybrid clustering algurtiliin; League Championship Algorithm (LCA); Chaos optimization algorithms (COA); Mixed-type data; OPTIMIZATION ALGORITHM; GLOBAL OPTIMIZATION; SEARCH;

D O I：

10.1016/j.eswa.2017.08.004

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The success rates of the expert or intelligent systems depend on the selection of the correct data clusters. The k-means algorithm is a well-known method in solving data clustering problems. It suffers not only from a high dependency on the algorithm's initial solution but also from the used distance function. A number of algorithms have been proposed to address the centroid initialization problem, but the produced solution does not produce optimum clusters. This paper proposes three algorithms (i) the search algorithm C-LCA that is an improved League Championship Algorithm (LCA), (ii) a search clustering using C-LCA (SC-LCA), and (iii) a hybrid-clustering algorithm called the hybrid of k-means and Chaotic League Championship Algorithm (KSC-LCA) and this algorithm has of two computation stages. The C-LCA employs chaotic adaptation for the retreat and approach parameters, rather than constants, which can enhance the search capability. Furthermore, to overcome the limitation of the original k-means algorithm using the Euclidean distance that cannot handle the categorical attribute type properly, we adopt the Gower distance and the mechanism for handling a discrete value requirement of the categorical value attribute. The proposed algorithms can handle not only the pure numeric data but also the mixed-type data and can find the best centroids containing categorical values. Experiments were conducted on 14 datasets from the UCI repository. The SC-LCA and KSC-LCA competed with 16 established algorithms including the k-means, k-means++, global k-means algorithms, four search clustering algorithms and nine hybrids of k-means algorithm with several state-of-the-art evolutionary algorithms. The experimental results show that the SC-LCA produces the cluster with the highest F-Measure on the pure categorical dataset and the KSC-LCA produces the cluster with the highest F-Measure for the pure numeric and mixed-type tested datasets. Out of 14 datasets, there were 13 centroids produced by the SC-LCA that had better F-Measures than that of the k-means algorithm. On the Tic-Tac-Toe dataset containing only categorical attributes, the SC-LCA can achieve an F-Measure of 66.61 that is 21.74 points over that of the k-means algorithm (44.87). The KSC-LCA produced better centroids than k-means algorithm in all 14 datasets; the maximum F-Measure improvement was 11.59 points. However, in terms of the computational time, the SC-LCA and KSC-LCA took more NFEs than the k-means and its variants but the KSC-LCA ranks first and SC-LCA ranks fourth among the hybrid clustering and the search clustering algorithms that we tested. Therefore, the SC-LCA and KSC-LCA are general and effective clustering algorithms that could be used when an expert or intelligent system requires an accurate high-speed cluster selection. (C) 2017 Elsevier Ltd. All rights reserved.

引用

下载

页码：146 / 167

页数：22

共 50 条

[1] A k-means type clustering algorithm for subspace clustering of mixed numeric and categorical datasets
Ahmad, Amir
Dey, Lipika
PATTERN RECOGNITION LETTERS, 2011, 32 (07) : 1062 - 1069
[2] A Weight Entropy k-means Algorithm for Clustering Dataset with Mixed Numeric and Categorical Data
Li, Taoying
Chen, Yan
FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 1, PROCEEDINGS, 2008, : 36 - 41
[3] A modified K-means algorithm for categorical data clustering
Sun, Y
Zhu, QM
Chen, ZX
IC-AI'2000: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 1-III, 2000, : 31 - 37
[4] A CLUSTERING ALGORITHM FOR MIXED NUMERIC AND CATEGORICAL DATA
Ohn Mar San
Van-Nam Huynh
Yoshiteru Nakamori
Journal of Systems Science & Complexity, 2003, (04) : 562 - 571
[5] A k-mean clustering algorithm for mixed numeric and categorical data
Ahmad, Amir
Dey, Lipika
DATA & KNOWLEDGE ENGINEERING, 2007, 63 (02) : 503 - 527
[6] An improved k-prototypes clustering algorithm for mixed numeric and categorical data
Ji, Jinchao
Bai, Tian
Zhou, Chunguang
Ma, Chao
Wang, Zhe
NEUROCOMPUTING, 2013, 120 : 590 - 596
[7] A fuzzy k-prototype clustering algorithm for mixed numeric and categorical data
Ji, Jinchao
Pang, Wei
Zhou, Chunguang
Han, Xiao
Wang, Zhe
KNOWLEDGE-BASED SYSTEMS, 2012, 30 : 129 - 135
[8] K-Means Extensions for Clustering Categorical Data
Alwersh, Mohammed
Kovacs, Laszlo
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (09) : 492 - 507
[9] An efficient K-means clustering algorithm for tall data
Capo, Marco
Perez, Aritz
Lozano, Jose A.
DATA MINING AND KNOWLEDGE DISCOVERY, 2020, 34 (03) : 776 - 811
[10] An efficient K-means clustering algorithm for tall data
Marco Capó
Aritz Pérez
Jose A. Lozano
Data Mining and Knowledge Discovery, 2020, 34 : 776 - 811

← 1 2 3 4 5 →