On efficient model selection for sparse hard and fuzzy center-based clustering algorithms

被引:6
|
作者
Gupta, Avisek [1 ]
Das, Swagatam [1 ]
机构
[1] Indian Stat Inst, Elect & Commun Sci Unit, 203 BT Rd, Kolkata 700108, W Bengal, India
关键词
Sparse clustering; Model selection; Sparse k-means; Sparse fuzzy c-means; Bayesian information criterion; VALIDITY INDEX; C-MEANS; NUMBER;
D O I
10.1016/j.ins.2021.12.070
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The class of center-based clustering algorithms offers methods to efficiently identify clusters in data sets, making them applicable to larger data sets. While a data set may contain several features, not all of them may be equally informative or helpful towards cluster detection. Therefore, sparse center-based clustering methods offer a way to select only those features that may be useful in identifying the clusters present in a data set. However, to automatically determine the degree to which features should be selected, these methods use the Permutation Method which involves generating and clustering multiple randomly permuted data sets, leading to much higher computation costs. In this paper, we propose an improved approach towards model selection for sparse clustering by using expressions of Bayesian Information Criterion (BIC) derived for the center-based clustering methods of k-Means and Fuzzy c-Means. The derived expressions of BIC require significantly lower computation costs, yet allow us to compare and select a suitable sparse clustering among several possible sparse partitions that may have selected different subsets of features. Experiments on synthetic and real-world data sets show that using BIC for model selection leads to remarkable improvements in the identification of sparse clusterings for both Sparse k-Means and Sparse Fuzzy c-Means. (C) 2022 Elsevier Inc. All rights reserved.
引用
收藏
页码:29 / 44
页数:16
相关论文
共 50 条
  • [1] Comparison of the performance of center-based clustering algorithms
    Zhang, B
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, 2003, 2637 : 63 - 74
  • [2] MapReduce algorithms for robust center-based clustering in doubling metrics
    Dandolo, Enrico
    Mazzetto, Alessio
    Pietracaprina, Andrea
    Pucci, Geppino
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2024, 194
  • [3] Scale up center-based data clustering algorithms by parallelism
    Zhang, Bin
    Hsu, Meichun
    HP Laboratories Technical Report, 2000, (06):
  • [4] A New Optimization Model for Solving Center-Based Clustering Problem
    Ridwan Pandiya
    Atina Ahdika
    Siti Khomsah
    Rima Dias Ramadhani
    SN Computer Science, 5 (8)
  • [5] klcluster: Center-based Clustering of Trajectories
    Buchin, Kevin
    Driemel, Anne
    van de L'Isle, Natasja
    Nusser, Andre
    27TH ACM SIGSPATIAL INTERNATIONAL CONFERENCE ON ADVANCES IN GEOGRAPHIC INFORMATION SYSTEMS (ACM SIGSPATIAL GIS 2019), 2019, : 496 - 499
  • [6] Clustering Center-based Differential Evolution
    Khosrowshahli, Rasa
    Rahnamayan, Shahryar
    Bidgoli, Azam Asilian
    2022 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2022,
  • [7] Stronger Convergence Results for the Center-Based Fuzzy Clustering With Convex Divergence Measure
    Saha, Arkajyoti
    Das, Swagatam
    IEEE TRANSACTIONS ON CYBERNETICS, 2019, 49 (12) : 4229 - 4242
  • [8] Distributed Center-Based Clustering: A Unified Framework
    Armacki, Aleksandar
    Bajovic, Dragana
    Jakovetic, Dusan
    Kar, Soummya
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2025, 73 : 903 - 918
  • [9] Multiple ellipse fitting by center-based clustering
    Marosevic, Tomislav
    Scitovski, Rudolf
    CROATIAN OPERATIONAL RESEARCH REVIEW, 2015, 6 (01) : 43 - 53
  • [10] Multiple circle detection based on center-based clustering
    Scitovski, Rudolf
    Marosevic, Tomislav
    PATTERN RECOGNITION LETTERS, 2015, 52 : 9 - 16