Centroid-based clustering validity: method and application to quantification of optimal cluster-data space

被引:0
|
作者
Nguyen, Sy Dzung [1 ,2 ]
机构
[1] Laboratory for Computational Mechatronics, Institute for Computational Science and Artificial Intelligence, Van Lang University, Ho Chi Minh, Viet Nam
[2] Faculty of Mechanical - Electrical and Computer Engineering, School of Technology, Van Lang University, Ho Chi Minh, Viet Nam
关键词
The authors would like to thank the Vietnam National Foundation for Science and Technology Development (NAFOSTED) under Grant Number 107.01-2019.328;
D O I
10.1007/s00500-024-09871-0
中图分类号
学科分类号
摘要
Evaluation of clustering validity to set up an optimal cluster-data space (CDS) is a vital task in many fields related to data mining. Almost existing clustering validity indexes (CVIs) lack stability due to being too sensitive to noise, especially impulse noise. Here, we (1) propose a new CVI named DzI (Dzung Index) or fRisk2 using analysis of fuzzy-set-based accumulated risk degree (FARD), and (2) present a new algorithm named fRisk2-bA for determining the optimal number of data clusters. It is a method of evaluation of the centroid-based fuzzy clustering validity. In essence, the fRisk2 still focuses on enhancing the data compression in each cluster and expanding the separation between cluster centroids. However, these features are exploited indirectly through FARD. As a result, the proposed method not only can avoid the difficulties of the traditional ones relying on the compression and separation properties directly but also can distill better local and global attributes in the data distribution to estimate the CDS more fully. Along with the proved theory basis, surveys, including the ones based on noisy datasets from measurements, showed the compared advantages of fRisk2 as follows. (1) The accuracy, stability, and convergence of the fRisk2 are outstanding. (2) Its total calculating cost is lower than the other surveyed CVIs.
引用
收藏
页码:10853 / 10872
页数:19
相关论文
共 50 条
  • [21] Assessment of Microarray Data Clustering Results Based on a New Geometrical Index for Cluster Validity
    Benson S. Y. Lam
    Hong Yan
    Soft Computing, 2007, 11 : 341 - 348
  • [22] A Validity Index for Prototype-Based Clustering of Data Sets With Complex Cluster Structures
    Tasdemir, Kadim
    Merenyi, Erzsebet
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2011, 41 (04): : 1039 - 1053
  • [23] Efficient similarity-based data clustering by optimal object to cluster reallocation
    Rossignol, Mathias
    Lagrange, Mathieu
    Cont, Arshia
    PLOS ONE, 2018, 13 (06):
  • [24] A Comparison of Fuzzy and Intuitionistic Fuzzy Frameworks for Individual and Group Replacement Approaches with Scrap Values Using Centroid-Based Ranking Method for Optimal Results
    Saranya, V.
    Sundari, M. Shanmuga
    Priya, S. Lakshmi
    CONTEMPORARY MATHEMATICS, 2024, 5 (03): : 3676 - 3688
  • [25] Maxmin Data Range Heuristic-Based Initial Centroid Method of Partitional Clustering for Big Data Mining
    Pandey, Kamlesh Kumar
    Shukla, Diwakar
    INTERNATIONAL JOURNAL OF INFORMATION RETRIEVAL RESEARCH, 2022, 12 (01)
  • [26] A new cluster validity measure based on general type-2 fuzzy sets: Application in gene expression data clustering
    Torshizi, Abolfazl Doostparast
    Zarandi, Mohammad Hossein Fazel
    KNOWLEDGE-BASED SYSTEMS, 2014, 64 : 81 - 93
  • [27] Cluster Validity Index for Uncertain Data Based on a Probabilistic Distance Measure in Feature Space
    Ko, Changwan
    Baek, Jaeseung
    Tavakkol, Behnam
    Jeong, Young-Seon
    SENSORS, 2023, 23 (07)
  • [28] Method for data clustering in a high dimensional space based on a hypergraph model
    Zhang, Rong
    Peng, Hong
    Jisuanji Gongcheng/Computer Engineering, 2002, 28 (07):
  • [29] Centroid-based endmember optimization of the triangular space method for fractional cover estimation: Mapping fractional cover of a vegetated ecosystem on Sentinel-3 OLCI image
    Tian, Jia
    Tian, Qingjiu
    Li, Suju
    Zhang, Sen
    Li, Qianjing
    Wang, Chunsheng
    INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2024, 134
  • [30] Maxmin distance sort heuristic-based initial centroid method of partitional clustering for big data mining
    Pandey, Kamlesh Kumar
    Shukla, Diwakar
    PATTERN ANALYSIS AND APPLICATIONS, 2022, 25 (01) : 139 - 156