A new validity clustering index-based on finding new centroid positions using the mean of clustered data to determine the optimum number of clusters

被引:17
|
作者
Abdalameer, Ahmed Khaldoon [1 ]
Alswaitti, Mohammed [2 ]
Alsudani, Ahmed Adnan [3 ]
Isa, Nor Ashidi Mat [1 ]
机构
[1] Univ Sains Malaysia, Sch Elect & Elect Engn, Engn Campus, Nibong Tebal 14300, Penang, Malaysia
[2] Xiamen Univ Malaysia, Sch Elect & Comp Engn ICT, Sepang 43900, Selangor, Malaysia
[3] Jawaharlal Nehru Technol Univ Hyderabad, Sch Elect & Elect Engn, Engn Campus, Hyderabad 500085, Telangana, India
关键词
Number of clusters; Clustering validity index; K-means; Fuzzy C-means; Hierarchical clustering; DIFFERENTIAL EVOLUTION; VALIDATION; ALGORITHM;
D O I
10.1016/j.eswa.2021.116329
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering, an unsupervised pattern classification method, plays an important role in identifying input dataset structures. It partitions input datasets into clusters or groups where either the optimum number of clusters is known in prior or automatically determined. In the case of automatic clustering, the performance is evaluated using a cluster validity index (CVI), which determines the optimum number of clusters in the data. From previous works, the improper cluster centroids positioning produced by clustering algorithms could reduce the performance of the validation process and performance produced by the previous state-of-the-art CVIs. In addition, those previous CVIs can only work properly with certain clustering algorithms and simple datasets structures, which their performances will reduce if they are applied to other clustering algorithms as well as more complex datasets. This study proposes an efficient CVI, namely, the validity clustering index based on finding the mean of clustered data (VCIM). The proposed approach combines the properties of the score function index and the mean to determine new cluster centroid positions. The performance of the VCIM index is compared with well-known CVIs on both artificial and real-life datasets. The obtained results on artificial datasets show that the proposed VCIM index outperforms the other CVIs in determining the true number of clusters for the five conventional clustering algorithms, namely, K-means, Fuzzy C-mean, agglomerative hierarchical average linkage clustering, variance-based differential evolution, and density peaks clustering and Particle swarm optimization (PDPC) algorithms. For the 14 real-word datasets, the proposed VCIM index correctly determined the optimum number of clusters for 11 out of 14 for the K-means clustering algorithm, 9 out of 14 for both Fuzzy clustering and agglomerative hierarchical average linkage clustering algorithms, 12 out of 14 for the variance-based differential evolution algorithm and 11 out of 14 datasets for PDPC. The obtained results using the proposed VCIM show its significance when combined with clustering algorithms and nominate its potential in various clustering applications.
引用
收藏
页数:14
相关论文
共 8 条
  • [1] A New Fuzzy Clustering Validity Index With a Median Factor for Centroid-Based Clustering
    Wu, Chih-Hung
    Ouyang, Chen-Sen
    Chen, Li-Wen
    Lu, Li-Wei
    [J]. IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2015, 23 (03) : 701 - 718
  • [2] Assessment of microarray data clustering results based on a new geometrical index for cluster validity
    Lam, Benson S. Y.
    Yan, Hong
    [J]. SOFT COMPUTING, 2007, 11 (04) : 341 - 348
  • [3] Assessment of Microarray Data Clustering Results Based on a New Geometrical Index for Cluster Validity
    Benson S. Y. Lam
    Hong Yan
    [J]. Soft Computing, 2007, 11 : 341 - 348
  • [5] Effective Clustering Analysis Based on New Designed Clustering Validity Index and Revised K-means Algorithm for Big Data
    Zhu, Erzhou
    Wen, Peng
    Zhu, Binbin
    Liu, Feng
    Wang, Futian
    Li, Xuejun
    [J]. 2018 IEEE INT CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, UBIQUITOUS COMPUTING & COMMUNICATIONS, BIG DATA & CLOUD COMPUTING, SOCIAL COMPUTING & NETWORKING, SUSTAINABLE COMPUTING & COMMUNICATIONS, 2018, : 96 - 102
  • [6] Finding new pathway-specific regulators by clustering method using threshold standard deviation based on DNA chip data of Streptomyces coelicolor
    Yung-Hun Yang
    Ji-Nu Kim
    Eunjung Song
    Eunjung Kim
    Min-Kyu Oh
    Byung-Gee Kim
    [J]. Applied Microbiology and Biotechnology, 2008, 80 : 709 - 717
  • [7] Finding new pathway-specific regulators by clustering method using threshold standard deviation based on DNA chip data of Streptomyces coelicolor
    Yang, Yung-Hun
    Kim, Ji-Nu
    Song, Eunjung
    Kim, Eunjung
    Oh, Min-Kyu
    Kim, Byung-Gee
    [J]. APPLIED MICROBIOLOGY AND BIOTECHNOLOGY, 2008, 80 (04) : 709 - 717
  • [8] Calculating Vegetation Index-Based Crop Coefficients for Alfalfa in the Mesilla Valley, New Mexico Using Harmonized Landsat Sentinel-2 (HLS) Data and Eddy Covariance Flux Tower Data
    Sabie, Robert
    Bawazir, A. Salim
    Buenemann, Michaela
    Steele, Caitriana
    Fernald, Alexander
    [J]. REMOTE SENSING, 2024, 16 (16)