Model-based clustering of high-dimensional data: Variable selection versus facet determination

被引:15
|
作者
Poon, Leonard K. M. [1 ]
Zhang, Nevin L. [1 ]
Liu, Tengfei [1 ]
Liu, April H. [1 ]
机构
[1] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Hong Kong, Hong Kong, Peoples R China
关键词
Model-based clustering; Facet determination; Variable selection; Latent tree models; Gaussian mixture models; LIKELIHOOD;
D O I
10.1016/j.ijar.2012.08.001
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Variable selection is an important problem for cluster analysis of high-dimensional data. It is also a difficult one. The difficulty originates not only from the lack of class information but also the fact that high-dimensional data are often multifaceted and can be meaningfully clustered in multiple ways. In such a case the effort to find one subset of attributes that presumably gives the "best" clustering may be misguided. It makes more sense to identify various facets of a data set (each being based on a subset of attributes), cluster the data along each one, and present the results to the domain experts for appraisal and selection. In this paper, we propose a generalization of the Gaussian mixture models and demonstrate its ability to automatically identify natural facets of data and cluster data along each of those facets simultaneously. We present empirical results to show that facet determination usually leads to better clustering results than variable selection. (C) 2012 Elsevier Inc. All rights reserved.
引用
收藏
页码:196 / 215
页数:20
相关论文
共 50 条
  • [1] Variable selection for model-based high-dimensional clustering
    Wang, Sijian
    Zhu, Ji
    [J]. PREDICTION AND DISCOVERY, 2007, 443 : 177 - +
  • [2] Pairwise Variable Selection for High-Dimensional Model-Based Clustering
    Guo, Jian
    Levina, Elizaveta
    Michailidis, George
    Zhu, Ji
    [J]. BIOMETRICS, 2010, 66 (03) : 793 - 804
  • [3] Variable selection for model-based high-dimensional clustering and its application to microarray data
    Wang, Sijian
    Zhu, Ji
    [J]. BIOMETRICS, 2008, 64 (02) : 440 - 448
  • [4] Model-based clustering of high-dimensional data: A review
    Bouveyron, Charles
    Brunet-Saumard, Camille
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2014, 71 : 52 - 78
  • [5] MODEL-BASED CLUSTERING OF HIGH-DIMENSIONAL DATA IN ASTROPHYSICS
    Bouveyron, C.
    [J]. STATISTICS FOR ASTROPHYSICS: CLUSTERING AND CLASSIFICATION, 2016, 77 : 91 - 119
  • [6] Bayesian variable selection in clustering high-dimensional data
    Tadesse, MG
    Sha, N
    Vannucci, M
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2005, 100 (470) : 602 - 617
  • [7] Bayesian Variable Selection in Clustering High-Dimensional Data With Substructure
    Swartz, Michael D.
    Mo, Qianxing
    Murphy, Mary E.
    Lupton, Joanne R.
    Turner, Nancy D.
    Hong, Mee Young
    Vannucci, Marina
    [J]. JOURNAL OF AGRICULTURAL BIOLOGICAL AND ENVIRONMENTAL STATISTICS, 2008, 13 (04) : 407 - 423
  • [8] Bayesian variable selection in clustering high-dimensional data with substructure
    Michael D. Swartz
    Qianxing Mo
    Mary E. Murphy
    Joanne R. Lupton
    Nancy D. Turner
    Mee Young Hong
    Marina Vannucci
    [J]. Journal of Agricultural, Biological, and Environmental Statistics, 2008, 13 : 407 - 423
  • [9] Model-based regression clustering for high-dimensional data: application to functional data
    Devijver, Emilie
    [J]. ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2017, 11 (02) : 243 - 279
  • [10] Model-based clustering of high-dimensional longitudinal data via regularization
    Yang, Luoying
    Wu, Tong Tong
    [J]. BIOMETRICS, 2023, 79 (02) : 761 - 774