Model-Based Clustering for Conditionally Correlated Categorical Data

被引:0
|
作者
Matthieu Marbac
Christophe Biernacki
Vincent Vandewalle
机构
[1] Inria Lille and DGA,
[2] University Lille 1,undefined
[3] CNRS and Inria,undefined
[4] University Lille 2 and Inria,undefined
来源
Journal of Classification | 2015年 / 32卷
关键词
Categorical data; Clustering; Correlation; Expectation-Maximization algorithm; Gibbs sampler; Mixture model; Model selection.;
D O I
暂无
中图分类号
学科分类号
摘要
An extension of the latent class model is presented for clustering categorical data by relaxing the classical “class conditional independence assumption” of variables. This model consists in grouping the variables into inter-independent and intra-dependent blocks, in order to consider the main intra-class correlations. The dependency between variables grouped inside the same block of a class is taken into account by mixing two extreme distributions, which are respectively the independence and the maximum dependency. When the variables are dependent given the class, this approach is expected to reduce the biases of the latent class model. Indeed, it produces a meaningful dependency model with only a few additional parameters. The parameters are estimated, by maximum likelihood, by means of an EM algorithm. Moreover, a Gibbs sampler is used for model selection in order to overcome the computational intractability of the combinatorial problems involved by the block structure search. Two applications on medical and biological data sets show the relevance of this new model. The results strengthen the view that this model is meaningful and that it reduces the biases induced by the conditional independence assumption of the latent class model.
引用
收藏
页码:145 / 175
页数:30
相关论文
共 50 条
  • [1] Model-Based Clustering for Conditionally Correlated Categorical Data
    Marbac, Matthieu
    Biernacki, Christophe
    Vandewalle, Vincent
    [J]. JOURNAL OF CLASSIFICATION, 2015, 32 (02) : 145 - 175
  • [2] Model-based multidimensional clustering of categorical data
    Chen, Tao
    Zhang, Nevin L.
    Liu, Tengfei
    Poon, Kin Man
    Wang, Yi
    [J]. ARTIFICIAL INTELLIGENCE, 2012, 176 (01) : 2246 - 2269
  • [3] Model-Based Hierarchical Clustering for Categorical Data
    Alalyan, Fahdah
    Zamzami, Nuha
    Bouguila, Nizar
    [J]. 2019 IEEE 28TH INTERNATIONAL SYMPOSIUM ON INDUSTRIAL ELECTRONICS (ISIE), 2019, : 1424 - 1429
  • [4] Mixture of latent trait analyzers for model-based clustering of categorical data
    Gollini, Isabella
    Murphy, Thomas Brendan
    [J]. STATISTICS AND COMPUTING, 2014, 24 (04) : 569 - 588
  • [5] Mixture of latent trait analyzers for model-based clustering of categorical data
    Isabella Gollini
    Thomas Brendan Murphy
    [J]. Statistics and Computing, 2014, 24 : 569 - 588
  • [6] The Clustering of Categorical Data: A Comparison of a Model-based and a Distance-based Approach
    Anderlucci, Laura
    Hennig, Christian
    [J]. COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2014, 43 (04) : 704 - 721
  • [7] SELECTING CATEGORICAL FEATURES IN MODEL-BASED CLUSTERING
    Silvestre, Claudia M. V.
    Cardoso, Margarida M. G.
    Figueiredo, Mario A. T.
    [J]. KDIR 2009: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND INFORMATION RETRIEVAL, 2009, : 303 - +
  • [8] Model-based Clustering of Categorical Time Series
    Pamminger, Christoph
    Fruehwirth-Schnatter, Sylvia
    [J]. BAYESIAN ANALYSIS, 2010, 5 (02): : 345 - 368
  • [9] ClickClust: An R Package for Model-Based Clustering of Categorical Sequences
    Melnykov, Volodymyr
    [J]. JOURNAL OF STATISTICAL SOFTWARE, 2016, 74 (09): : 1 - 34
  • [10] Enhancing the selection of a model-based clustering with external categorical variables
    Baudry, Jean-Patrick
    Cardoso, Margarida
    Celeux, Gilles
    Amorim, Maria Jose
    Ferreira, Ana Sousa
    [J]. ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2015, 9 (02) : 177 - 196