Centroid-based clustering validity: method and application to quantification of optimal cluster-data space

被引:0
|
作者
Nguyen, Sy Dzung [1 ,2 ]
机构
[1] Laboratory for Computational Mechatronics, Institute for Computational Science and Artificial Intelligence, Van Lang University, Ho Chi Minh, Viet Nam
[2] Faculty of Mechanical - Electrical and Computer Engineering, School of Technology, Van Lang University, Ho Chi Minh, Viet Nam
关键词
The authors would like to thank the Vietnam National Foundation for Science and Technology Development (NAFOSTED) under Grant Number 107.01-2019.328;
D O I
10.1007/s00500-024-09871-0
中图分类号
学科分类号
摘要
Evaluation of clustering validity to set up an optimal cluster-data space (CDS) is a vital task in many fields related to data mining. Almost existing clustering validity indexes (CVIs) lack stability due to being too sensitive to noise, especially impulse noise. Here, we (1) propose a new CVI named DzI (Dzung Index) or fRisk2 using analysis of fuzzy-set-based accumulated risk degree (FARD), and (2) present a new algorithm named fRisk2-bA for determining the optimal number of data clusters. It is a method of evaluation of the centroid-based fuzzy clustering validity. In essence, the fRisk2 still focuses on enhancing the data compression in each cluster and expanding the separation between cluster centroids. However, these features are exploited indirectly through FARD. As a result, the proposed method not only can avoid the difficulties of the traditional ones relying on the compression and separation properties directly but also can distill better local and global attributes in the data distribution to estimate the CDS more fully. Along with the proved theory basis, surveys, including the ones based on noisy datasets from measurements, showed the compared advantages of fRisk2 as follows. (1) The accuracy, stability, and convergence of the fRisk2 are outstanding. (2) Its total calculating cost is lower than the other surveyed CVIs.
引用
收藏
页码:10853 / 10872
页数:19
相关论文
共 50 条
  • [41] Panel data clustering method based on grey convex relation and its application
    Wu, L.-F. (wlf6666@126.com), 1600, Northeast University (28):
  • [42] A clustering method based on data queries and its application in database intrusion detection
    Zhong, Y
    Zhu, Z
    Qin, XL
    PROCEEDINGS OF 2005 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-9, 2005, : 2096 - 2101
  • [43] An Improved Method for Clustering Gene Microarray Data Based on Intra-Cluster Distance and Variance
    Bhattacharjee, Kasturi
    Chatterjee, Soumyadeep
    Konar, Amit
    Janarthanan, R.
    2009 IEEE INTERNATIONAL ADVANCE COMPUTING CONFERENCE, VOLS 1-3, 2009, : 20 - +
  • [44] Optimal Length Selection Method of DGA Data Based on Phase Space Reconstruction
    Qi B.
    Zhang P.
    Rong Z.
    Li C.
    Yang Y.
    Chen Y.
    2018, Chinese Society for Electrical Engineering (38): : 2504 - 2512
  • [45] Space event detection method based on cluster analysis of satellite historical orbital data
    Li, Tao
    Chen, Lei
    ACTA ASTRONAUTICA, 2019, 160 : 414 - 420
  • [46] A learning-based optimal uncertainty quantification method and its application to ballistic impact problems
    Sun, Xingsheng
    Liu, Burigede
    MECHANICS OF MATERIALS, 2023, 184
  • [47] SPACE-TIME CLUSTERING AND BONE-TUMORS - APPLICATION OF KNOXS METHOD TO DATA FROM A POPULATION-BASED CANCER REGISTRY
    SILCOCKS, PBS
    MURRELLS, T
    INTERNATIONAL JOURNAL OF CANCER, 1987, 40 (06) : 769 - 771
  • [48] Data aggregation scheme for IOT based wireless sensor network through optimal clustering method
    Badiger V.S.
    Ganashree T.S.
    Measurement: Sensors, 2022, 24
  • [49] Application and evaluation of a K-Medoids-based shape clustering method for an articulated design space
    Yousif, Shermeen
    Yan, Wei
    JOURNAL OF COMPUTATIONAL DESIGN AND ENGINEERING, 2021, 8 (03) : 935 - 948
  • [50] Min–max kurtosis mean distance based k-means initial centroid initialization method for big genomic data clustering
    Kamlesh Kumar Pandey
    Diwakar Shukla
    Evolutionary Intelligence, 2023, 16 : 1055 - 1076