A novel clustering algorithm based on data transformation approaches

被引:21
|
作者
Azimi, Rasool [1 ]
Ghayekhloo, Mohadeseh [1 ]
Ghofrani, Mahmoud [2 ]
Sajedi, Hedieh [3 ]
机构
[1] Islamic Azad Univ, Qazvin Branch, Young Researchers & Elite Club, Qazvin, Iran
[2] Univ Washington, Sch Sci Technol Engn & Math STEM, Bothell, WA USA
[3] Univ Tehran, Dept Comp Sci, Coll Sci, Tehran, Iran
关键词
Data mining; Clustering; K-means; Data transformation; Silhouette; Transformed K-means;
D O I
10.1016/j.eswa.2017.01.024
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering provides a knowledge acquisition method for intelligent systems. This paper proposes a novel data-clustering algorithm, by combining a new initialization technique, K-means algorithm and a new gradual data transformation approach to provide more accurate clustering results than the K-means algorithm and its variants by increasing the clusters' coherence. The proposed data transformation approach solves the problem of generating empty clusters, which frequently occurs for other clustering algorithms. An efficient method based on the principal component transformation and a modified silhouette algorithm is also proposed in this paper to determine the number of clusters. Several different data sets are used to evaluate the efficacy of the proposed method to deal with the empty cluster generation problem and its accuracy and computational performance in comparison with other K-means based initialization techniques and clustering methods. The developed estimation method for determining the number of clusters is also evaluated and compared with other estimation algorithms. Significances of the proposed method include addressing the limitations of the K-means based clustering and improving the accuracy of clustering as an important method in the field of data mining and expert systems. Application of the proposed method for the knowledge acquisition in time series data such as wind, solar, electric load and stock market provides a pre-processing tool to select the most appropriate data to feed in neural networks or other estimators in use for forecasting such time series. In addition, utilization of the knowledge discovered by the proposed K-means clustering to develop rule based expert systems is one of the main impacts of the proposed method. (C) 2017 Elsevier Ltd. All rights reserved.
引用
收藏
页码:59 / 70
页数:12
相关论文
共 50 条
  • [1] A novel algorithm for data clustering
    Wong, CC
    Chen, CC
    Su, MC
    PATTERN RECOGNITION, 2001, 34 (02) : 425 - 442
  • [2] A novel data clustering algorithm based on modified gravitational search algorithm
    Han, XiaoHong
    Quan, Long
    Xiong, XiaoYan
    Almeter, Matt
    Xiang, Jie
    Lan, Yuan
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2017, 61 : 1 - 7
  • [3] Genetic Algorithm and Simulated Annealing based Approaches to Categorical Data Clustering
    Saha, Indrajit
    Mukhopadhyay, Anirban
    IEEE REGION 10 COLLOQUIUM AND THIRD INTERNATIONAL CONFERENCE ON INDUSTRIAL AND INFORMATION SYSTEMS, VOLS 1 AND 2, 2008, : 18 - +
  • [4] Genetic algorithm and simulated annealing based approaches to categorical data clustering
    Saha, Indrajit
    Mukhopadhyay, Anirban
    IMECS 2008: INTERNATIONAL MULTICONFERENCE OF ENGINEERS AND COMPUTER SCIENTISTS, VOLS I AND II, 2008, : 534 - +
  • [5] A Novel Data Association Algorithm based on Intuitionistic Fuzzy Clustering
    Li Liang-qun
    Xie Wei-xin
    PROCEEDINGS OF 2012 IEEE 11TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP) VOLS 1-3, 2012, : 2121 - 2124
  • [6] A Novel Data Clustering Algorithm Based on Electrostatic Field Concepts
    Khandani, Masoumeh Kalantari
    Saeedi, Parvaneh
    Fallah, Yaser P.
    Khandani, Mehdi K.
    2009 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DATA MINING, 2009, : 232 - 237
  • [7] A novel data clustering algorithm based on gravity center methodology
    Kuwil, Farag Hamed
    Atila, Umit
    Abu-Issa, Radwan
    Murtagh, Fionn
    EXPERT SYSTEMS WITH APPLICATIONS, 2020, 156
  • [9] A Novel Scheduling Algorithm based on Clustering Analysis and Data Partitioning For Big Data
    Cui, Weiqi
    Liu, Nan
    Dong, Yihuan
    Li, Jiaqi
    Zhang, Qingchen
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTER, NETWORKS AND COMMUNICATION ENGINEERING (ICCNCE 2013), 2013, 30 : 549 - 551
  • [10] A Novel Streaming Data Clustering Algorithm Based on Fitness Proportionate Sharing
    Yan, Xuyang
    Jahromi, Mohammad Razeghi
    Homaifar, Abdollah
    Erol, Berat A.
    Girma, Abenezer
    Tunstel, Edward
    IEEE ACCESS, 2019, 7 (184985-185000) : 184985 - 185000