Comparing Model Selection and Regularization Approaches to Variable Selection in Model-Based Clustering

被引:0
|
作者
Celeux, Gilles [1 ]
Martin-Magniette, Marie-Laure [2 ,3 ]
Maugis-Rabusseau, Cathy [4 ]
Raftery, Adrian E. [5 ,6 ]
机构
[1] Inria Saclay Ile de France, Orsay, France
[2] UEVE ERL CNRS 8196, Unite Rech Genom Vegetale, UMR INRA 1165, Evry, France
[3] UMR AgroParisTech, INRA MIA 518, Paris, France
[4] Univ Toulouse, INSA Toulouse, Inst Math Toulouse, Toulouse, France
[5] Univ Washington, Dept Stat, Seattle, WA 98195 USA
[6] Univ Coll Dublin, Sch Math Sci, Dublin, Ireland
来源
JOURNAL OF THE SFDS | 2014年 / 155卷 / 02期
基金
爱尔兰科学基金会;
关键词
Model-based clustering; Model selection Regularization approach; Variable selection;
D O I
暂无
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We compare two major approaches to variable selection in clustering: model selection and regularization. Based on previous results, we select the method of Maugis et al. (2009b), which modified the method of Raftery and Dean (2006), as a current state of the art model selection method. We select the method of Witten and Tibshirani (2010) as a current state of the art regularization method. We compared the methods by simulation in terms of their accuracy in both classification and variable selection. In the first simulation experiment all the variables were conditionally independent given cluster membership. We found that variable selection (of either kind) yielded substantial gains in classification accuracy when the clusters were well separated, but few gains when the clusters were close together. We found that the two variable selection methods had comparable classification accuracy, but that the model selection approach had substantially better accuracy in selecting variables. In our second simulation experiment, there were correlations among the variables given the cluster memberships. We found that the model selection approach was substantially more accurate in terms of both classification and variable selection than the regularization approach, and that both gave more accurate classifications than K-means without variable selection. But the model selection approach is not available in a very high dimension context.
引用
收藏
页码:57 / 71
页数:15
相关论文
共 50 条
  • [1] Variable selection in model-based clustering and discriminant analysis with a regularization approach
    Gilles Celeux
    Cathy Maugis-Rabusseau
    Mohammed Sedki
    [J]. Advances in Data Analysis and Classification, 2019, 13 : 259 - 278
  • [2] Variable selection in model-based clustering and discriminant analysis with a regularization approach
    Celeux, Gilles
    Maugis-Rabusseau, Cathy
    Sedki, Mohammed
    [J]. ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2019, 13 (01) : 259 - 278
  • [3] Variable selection for model-based clustering
    Raftery, AE
    Dean, N
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2006, 101 (473) : 168 - 178
  • [4] Variable selection in penalized model-based clustering via regularization on grouped parameters
    Xie, Benhuai
    Pan, Wei
    Shen, Xiaotong
    [J]. BIOMETRICS, 2008, 64 (03) : 921 - 930
  • [5] Variable selection methods for model-based clustering
    Fop, Michael
    Murphy, Thomas Brendan
    [J]. STATISTICS SURVEYS, 2018, 12 : 18 - 65
  • [6] Penalized model-based clustering with application to variable selection
    Pan, Wei
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2007, 8 : 1145 - 1164
  • [7] Variable selection in model-based clustering: A general variable role modeling
    Maugis, C.
    Celeux, G.
    Martin-Magniette, M. -L.
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2009, 53 (11) : 3872 - 3882
  • [8] A simple model-based approach to variable selection in classification and clustering
    Partovi Nia, Vahid
    Davison, Anthony C.
    [J]. CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2015, 43 (02): : 157 - 175
  • [9] Variable selection for model-based high-dimensional clustering
    Wang, Sijian
    Zhu, Ji
    [J]. PREDICTION AND DISCOVERY, 2007, 443 : 177 - +
  • [10] Regularization and variable selection in Heckman selection model
    Emmanuel O. Ogundimu
    [J]. Statistical Papers, 2022, 63 : 421 - 439