Centroid-based clustering validity: method and application to quantification of optimal cluster-data space

被引:0
|
作者
Nguyen, Sy Dzung [1 ,2 ]
机构
[1] Laboratory for Computational Mechatronics, Institute for Computational Science and Artificial Intelligence, Van Lang University, Ho Chi Minh, Viet Nam
[2] Faculty of Mechanical - Electrical and Computer Engineering, School of Technology, Van Lang University, Ho Chi Minh, Viet Nam
关键词
The authors would like to thank the Vietnam National Foundation for Science and Technology Development (NAFOSTED) under Grant Number 107.01-2019.328;
D O I
10.1007/s00500-024-09871-0
中图分类号
学科分类号
摘要
Evaluation of clustering validity to set up an optimal cluster-data space (CDS) is a vital task in many fields related to data mining. Almost existing clustering validity indexes (CVIs) lack stability due to being too sensitive to noise, especially impulse noise. Here, we (1) propose a new CVI named DzI (Dzung Index) or fRisk2 using analysis of fuzzy-set-based accumulated risk degree (FARD), and (2) present a new algorithm named fRisk2-bA for determining the optimal number of data clusters. It is a method of evaluation of the centroid-based fuzzy clustering validity. In essence, the fRisk2 still focuses on enhancing the data compression in each cluster and expanding the separation between cluster centroids. However, these features are exploited indirectly through FARD. As a result, the proposed method not only can avoid the difficulties of the traditional ones relying on the compression and separation properties directly but also can distill better local and global attributes in the data distribution to estimate the CDS more fully. Along with the proved theory basis, surveys, including the ones based on noisy datasets from measurements, showed the compared advantages of fRisk2 as follows. (1) The accuracy, stability, and convergence of the fRisk2 are outstanding. (2) Its total calculating cost is lower than the other surveyed CVIs.
引用
收藏
页码:10853 / 10872
页数:19
相关论文
共 50 条
  • [1] A New Fuzzy Clustering Validity Index With a Median Factor for Centroid-Based Clustering
    Wu, Chih-Hung
    Ouyang, Chen-Sen
    Chen, Li-Wen
    Lu, Li-Wei
    IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2015, 23 (03) : 701 - 718
  • [2] Adaptive Centroid-based Clustering Algorithm for Text Document Data
    Li, Ximing
    Ouyang, Jihong
    Zhou, Xiaotang
    Fu, Bo
    2014 SIXTH INTERNATIONAL SYMPOSIUM ON PARALLEL ARCHITECTURES, ALGORITHMS AND PROGRAMMING (PAAP), 2014, : 63 - 68
  • [3] A parallel clustering method combined information bottleneck theory and centroid-based clustering
    Sun, Zhanquan
    Fox, Geoffrey
    Gu, Weidong
    Li, Zhao
    JOURNAL OF SUPERCOMPUTING, 2014, 69 (01): : 452 - 467
  • [4] A parallel clustering method combined information bottleneck theory and centroid-based clustering
    Zhanquan Sun
    Geoffrey Fox
    Weidong Gu
    Zhao Li
    The Journal of Supercomputing, 2014, 69 : 452 - 467
  • [5] A centroid-based gene selection method for microarray data classification
    Guo, Shun
    Guo, Donghui
    Chen, Lifei
    Jiang, Qingshan
    JOURNAL OF THEORETICAL BIOLOGY, 2016, 400 : 32 - 41
  • [6] OPTIMAL PROPERTIES OF CENTROID-BASED CLASSIFIERS FOR VERY HIGH-DIMENSIONAL DATA
    Hall, Peter
    Pham, Tung
    ANNALS OF STATISTICS, 2010, 38 (02): : 1071 - 1093
  • [7] A Centroid-based Ranking Method of Trapezoidal Intuitionistic Fuzzy Numbers and Its Application to MCDM Problems
    Das, Satyajit
    Guha, Debashree
    FUZZY INFORMATION AND ENGINEERING, 2016, 8 (01) : 41 - 74
  • [8] Data Mining Approach for Customer Segmentation in B2B Settings using Centroid-Based Clustering
    Maulina, Nadhira Riska
    Surjandari, Isti
    Rus, Annisa Marlin Masbar
    2019 16TH INTERNATIONAL CONFERENCE ON SERVICE SYSTEMS AND SERVICE MANAGEMENT (ICSSSM2019), 2019,
  • [9] Performance Evaluation of Fuzzy Cluster Validity Indexes for Optimal Data Clustering.
    Ouzala Mahd
    Habbi Hacene
    2013 13TH INTERNATIONAL CONFERENCE ON HYBRID INTELLIGENT SYSTEMS (HIS), 2013, : 41 - 44
  • [10] Potential-based fuzzy clustering and cluster validity for categorical data and its application in modeling cultural data
    Tsekouras, GE
    Kawa, A
    Sampanikou, E
    ICCC 2005: IEEE 3rd International Conference on Computational Cybernetics, 2005, : 81 - 86