Centroid-based clustering validity: method and application to quantification of optimal cluster-data space

被引:0
|
作者
Nguyen, Sy Dzung [1 ,2 ]
机构
[1] Laboratory for Computational Mechatronics, Institute for Computational Science and Artificial Intelligence, Van Lang University, Ho Chi Minh, Viet Nam
[2] Faculty of Mechanical - Electrical and Computer Engineering, School of Technology, Van Lang University, Ho Chi Minh, Viet Nam
关键词
The authors would like to thank the Vietnam National Foundation for Science and Technology Development (NAFOSTED) under Grant Number 107.01-2019.328;
D O I
10.1007/s00500-024-09871-0
中图分类号
学科分类号
摘要
Evaluation of clustering validity to set up an optimal cluster-data space (CDS) is a vital task in many fields related to data mining. Almost existing clustering validity indexes (CVIs) lack stability due to being too sensitive to noise, especially impulse noise. Here, we (1) propose a new CVI named DzI (Dzung Index) or fRisk2 using analysis of fuzzy-set-based accumulated risk degree (FARD), and (2) present a new algorithm named fRisk2-bA for determining the optimal number of data clusters. It is a method of evaluation of the centroid-based fuzzy clustering validity. In essence, the fRisk2 still focuses on enhancing the data compression in each cluster and expanding the separation between cluster centroids. However, these features are exploited indirectly through FARD. As a result, the proposed method not only can avoid the difficulties of the traditional ones relying on the compression and separation properties directly but also can distill better local and global attributes in the data distribution to estimate the CDS more fully. Along with the proved theory basis, surveys, including the ones based on noisy datasets from measurements, showed the compared advantages of fRisk2 as follows. (1) The accuracy, stability, and convergence of the fRisk2 are outstanding. (2) Its total calculating cost is lower than the other surveyed CVIs.
引用
收藏
页码:10853 / 10872
页数:19
相关论文
共 50 条
  • [31] Application of comparative strainer clustering as a novel method of high volume of data clustering to optimal power flow problem
    Azizi, E.
    Ghaemi, S.
    Mohammadi-Ivatloo, B.
    Piran, Md. Jalil
    INTERNATIONAL JOURNAL OF ELECTRICAL POWER & ENERGY SYSTEMS, 2019, 113 : 362 - 371
  • [32] Maxmin distance sort heuristic-based initial centroid method of partitional clustering for big data mining
    Kamlesh Kumar Pandey
    Diwakar Shukla
    Pattern Analysis and Applications, 2022, 25 : 139 - 156
  • [33] Directional Pattern based Clustering for Quantitative Survey Data: Method and Application
    Sadh, Roopam
    Kumar, Rajeev
    SURVEY RESEARCH METHODS, 2021, 15 (02): : 169 - 185
  • [34] An efficient topological-based clustering method on spatial data in network space
    Nguyen, Trang T. D.
    Nguyen, Loan T. T.
    Bui, Quang-Thinh
    Yun, Unil
    Vo, Bay
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 215
  • [35] Dynamic Micro-cluster-Based Streaming Data Clustering Method for Anomaly Detection
    Wang, Xiaolan
    Ahmed, Md Manjur
    Husen, Mohd Nizam
    Tao, Hai
    Zhao, Qian
    SOFT COMPUTING IN DATA SCIENCE, SCDS 2023, 2023, 1771 : 61 - 75
  • [36] Data Labeling method based on Cluster Purity using Relative Rough Entropy for Categorical Data Clustering
    Reddy, H. Venkateswara
    Raju, S. Viswanadha
    Agrawal, Pratibha
    2013 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2013, : 500 - 506
  • [37] SEP/COP: An efficient method to find the best partition in hierarchical clustering based on a new cluster validity index
    Gurrutxaga, Ibai
    Albisua, Inaki
    Arbelaitz, Olatz
    Martin, Jose I.
    Muguerza, Javier
    Perez, Jesus M.
    Perona, Inigo
    PATTERN RECOGNITION, 2010, 43 (10) : 3364 - 3373
  • [38] An improved method for k-means clustering based on internal validity indexes and inter-cluster variance
    Zhu, Guangli
    Li, Xiaoqing
    Zhang, Shunxiang
    Xu, Xin
    Zhang, Biao
    INTERNATIONAL JOURNAL OF COMPUTATIONAL SCIENCE AND ENGINEERING, 2022, 25 (03) : 253 - 261
  • [39] A new validity clustering index-based on finding new centroid positions using the mean of clustered data to determine the optimum number of clusters
    Abdalameer, Ahmed Khaldoon
    Alswaitti, Mohammed
    Alsudani, Ahmed Adnan
    Isa, Nor Ashidi Mat
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 191
  • [40] A spectral clustering-based optimal deployment method for scientific application in cloud computing
    Fan, Pei
    Wang, Ji
    Chen, Zhenbang
    Zheng, Zibin
    Lyu, Michael R.
    INTERNATIONAL JOURNAL OF WEB AND GRID SERVICES, 2012, 8 (01) : 31 - 55