Simultaneous feature selection and clustering using mixture models

被引:429
|
作者
Law, MHC
Figueiredo, MAT
Jain, AK
机构
[1] Michigan State Univ, Dept Comp Sci & Engn, E Lansing, MI 48824 USA
[2] Inst Super Tecn, Inst Telecomunicacoes, P-1049001 Lisbon, Portugal
关键词
feature selection; clustering; unsupervised learning; mixture models; minimum message length; EM algorithm;
D O I
10.1109/TPAMI.2004.71
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering is a common unsupervised learning technique used to discover group structure in a set of data. While there exist many algorithms for clustering, the important issue of feature selection, that is, what attributes of the data should be used by the clustering algorithms, is rarely touched upon. Feature selection for clustering is difficult because, unlike in supervised learning, there are no class labels for the data and, thus, no obvious criteria to guide the search. Another important problem in clustering is the determination of the number of clusters, which clearly impacts and is influenced by the feature selection issue. In this paper, we propose the concept of feature saliency and introduce an expectation-maximization (EM) algorithm to estimate it, in the context of mixture-based clustering. Due to the introduction of a minimum message length model selection criterion, the saliency of irrelevant features is driven toward zero, which corresponds to performing feature selection. The criterion and algorithm are then extended to simultaneously estimate the feature saliencies and the number of clusters.
引用
收藏
页码:1154 / 1166
页数:13
相关论文
共 50 条
  • [1] Simultaneous high-dimensional clustering and feature selection using asymmetric Gaussian mixture models
    Elguebaly, Tarek
    Bouguila, Nizar
    [J]. IMAGE AND VISION COMPUTING, 2015, 34 : 27 - 41
  • [2] Simultaneous clustering and feature selection via nonparametric Pitman–Yor process mixture models
    Wentao Fan
    Nizar Bouguila
    [J]. International Journal of Machine Learning and Cybernetics, 2019, 10 : 2753 - 2766
  • [3] Robust simultaneous positive data clustering and unsupervised feature selection using generalized inverted Dirichlet mixture models
    Al Mashrgy, Mohamed
    Bdiri, Taoufik
    Bouguila, Nizar
    [J]. KNOWLEDGE-BASED SYSTEMS, 2014, 59 : 182 - 195
  • [4] Simultaneous clustering and feature selection via nonparametric Pitman-Yor process mixture models
    Fan, Wentao
    Bouguila, Nizar
    [J]. INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2019, 10 (10) : 2753 - 2766
  • [5] Simultaneous Bayesian clustering and feature selection using RJMCMC-based learning of finite generalized Dirichlet mixture models
    Elguebaly, Tarek
    Bouguila, Nizar
    [J]. SIGNAL PROCESSING, 2013, 93 (06) : 1531 - 1546
  • [6] Background subtraction using infinite asymmetric Gaussian mixture models with simultaneous feature selection
    Song, Ziyang
    Ali, Samr
    Bouguila, Nizar
    [J]. IET IMAGE PROCESSING, 2020, 14 (11) : 2321 - 2332
  • [7] Simultaneous Feature Selection and Clustering Using Particle Swarm Optimization
    Swetha, K. P.
    Devi, V. Susheela
    [J]. NEURAL INFORMATION PROCESSING, ICONIP 2012, PT I, 2012, 7663 : 509 - 515
  • [8] Automatic Clustering simultaneous Feature Subset Selection using Differential Evolution
    Srinivas, V. Sesha
    Srikrishna, A.
    Reddy, B. Eswara
    [J]. 2018 5TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND INTEGRATED NETWORKS (SPIN), 2018, : 468 - 473
  • [9] Simultaneous feature selection and symmetry based clustering using multiobjective framework
    Saha, Sriparna
    Spandana, Rachamadugu
    Ekbal, Asif
    Bandyopadhyay, Sanghamitra
    [J]. APPLIED SOFT COMPUTING, 2015, 29 : 479 - 486
  • [10] Simultaneous feature selection and ant colony clustering
    Akarsu, Emre
    Karahoca, Adem
    [J]. WORLD CONFERENCE ON INFORMATION TECHNOLOGY (WCIT-2010), 2011, 3