A novel clustering algorithm based on data transformation approaches

被引:21
|
作者
Azimi, Rasool [1 ]
Ghayekhloo, Mohadeseh [1 ]
Ghofrani, Mahmoud [2 ]
Sajedi, Hedieh [3 ]
机构
[1] Islamic Azad Univ, Qazvin Branch, Young Researchers & Elite Club, Qazvin, Iran
[2] Univ Washington, Sch Sci Technol Engn & Math STEM, Bothell, WA USA
[3] Univ Tehran, Dept Comp Sci, Coll Sci, Tehran, Iran
关键词
Data mining; Clustering; K-means; Data transformation; Silhouette; Transformed K-means;
D O I
10.1016/j.eswa.2017.01.024
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering provides a knowledge acquisition method for intelligent systems. This paper proposes a novel data-clustering algorithm, by combining a new initialization technique, K-means algorithm and a new gradual data transformation approach to provide more accurate clustering results than the K-means algorithm and its variants by increasing the clusters' coherence. The proposed data transformation approach solves the problem of generating empty clusters, which frequently occurs for other clustering algorithms. An efficient method based on the principal component transformation and a modified silhouette algorithm is also proposed in this paper to determine the number of clusters. Several different data sets are used to evaluate the efficacy of the proposed method to deal with the empty cluster generation problem and its accuracy and computational performance in comparison with other K-means based initialization techniques and clustering methods. The developed estimation method for determining the number of clusters is also evaluated and compared with other estimation algorithms. Significances of the proposed method include addressing the limitations of the K-means based clustering and improving the accuracy of clustering as an important method in the field of data mining and expert systems. Application of the proposed method for the knowledge acquisition in time series data such as wind, solar, electric load and stock market provides a pre-processing tool to select the most appropriate data to feed in neural networks or other estimators in use for forecasting such time series. In addition, utilization of the knowledge discovered by the proposed K-means clustering to develop rule based expert systems is one of the main impacts of the proposed method. (C) 2017 Elsevier Ltd. All rights reserved.
引用
收藏
页码:59 / 70
页数:12
相关论文
共 50 条
  • [41] A Density Clustering Algorithm Based on Data Partitioning
    Li, Dongping
    PROCEEDINGS OF ANNUAL CONFERENCE OF CHINA INSTITUTE OF COMMUNICATIONS, 2010, : 251 - 254
  • [42] Study of clustering algorithm based on model data
    Li, Kai
    Cui, Li-Juan
    PROCEEDINGS OF 2007 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2007, : 3961 - +
  • [43] A new algorithm based on metaheuristics for data clustering
    Tsutomu Shohdohji
    Fumihiko Yano
    Yoshiaki Toyoda
    Journal of Zhejiang University-SCIENCE A, 2010, 11 : 921 - 926
  • [44] Clustering based Compress Data Cube algorithm
    Xie, Zhijun
    Nie, Mingxing
    Wang, Tongsen
    2009 WRI WORLD CONGRESS ON SOFTWARE ENGINEERING, VOL 4, PROCEEDINGS, 2009, : 429 - 433
  • [45] A Novel 2D Clustering Algorithm Based on Recursive Topological Data Structure
    Osuna-Galan, Ismael
    Perez-Pimentel, Yolanda
    Aviles-Cruz, Carlos
    SYMMETRY-BASEL, 2022, 14 (04):
  • [46] A new algorithm based on metaheuristics for data clustering
    Tsutomu SHOHDOHJI
    Fumihiko YANO
    Yoshiaki TOYODA
    Journal of Zhejiang University-Science A(Applied Physics & Engineering), 2010, (12) : 921 - 926
  • [47] A new algorithm based on metaheuristics for data clustering
    Shohdohji, Tsutomu
    Yano, Fumihiko
    Toyoda, Yoshiaki
    JOURNAL OF ZHEJIANG UNIVERSITY-SCIENCE A, 2010, 11 (12): : 921 - 926
  • [48] A new algorithm based on metaheuristics for data clustering
    Tsutomu SHOHDOHJI
    Fumihiko YANO
    Yoshiaki TOYODA
    Journal of Zhejiang University-Science A(Applied Physics & Engineering), 2010, 11 (12) : 921 - 926
  • [49] Data Clustering Based on Approach of Genetic Algorithm
    Wang, Hai-hui
    Zhao, Wen-jie
    2008 CHINESE CONTROL AND DECISION CONFERENCE, VOLS 1-11, 2008, : 2753 - 2757
  • [50] Research On Novel Model of Data Mining Based on Improved Association Rules and Clustering Algorithm
    Tan, Qing
    PROCEEDINGS OF THE 2017 7TH INTERNATIONAL CONFERENCE ON EDUCATION, MANAGEMENT, COMPUTER AND SOCIETY (EMCS 2017), 2017, 61 : 522 - 526