Model-based clustering of multivariate ordinal data relying on a stochastic binary search algorithm

被引:21
|
作者
Biernacki, Christophe [1 ,2 ]
Jacques, Julien [3 ]
机构
[1] Univ Lille 1, Lab Painleve, F-59655 Villeneuve Dascq, France
[2] Univ Lille 1, Inria, F-59655 Villeneuve Dascq, France
[3] Univ Lyon 2, Lab ERIC, F-69676 Bron, France
关键词
Ordinal data; Binary search algorithm; Latent variables; AECM algorithm;
D O I
10.1007/s11222-015-9585-2
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We design a probability distribution for ordinal data by modeling the process generating data, which is assumed to rely only on order comparisons between categories. Contrariwise, most competitors often either forget the order information or add a non-existent distance information. The data generating process is assumed, from optimality arguments, to be a stochastic binary search algorithm in a sorted table. The resulting distribution is natively governed by two meaningful parameters (position and precision) and has very appealing properties: decrease around the mode, shape tuning from uniformity to a Dirac, identifiability. Moreover, it is easily estimated by an EM algorithm since the path in the stochastic binary search algorithm can be considered as missing values. Using then the classical latent class assumption, the previous univariate ordinal model is straightforwardly extended to model-based clustering for multivariate ordinal data. Parameters of this mixture model are estimated by an AECM algorithm. Both simulated and real data sets illustrate the great potential of this model by its ability to parsimoniously identify particularly relevant clusters which were unsuspected by some traditional competitors.
引用
收藏
页码:929 / 943
页数:15
相关论文
共 50 条
  • [1] Model-based clustering of multivariate ordinal data relying on a stochastic binary search algorithm
    Christophe Biernacki
    Julien Jacques
    [J]. Statistics and Computing, 2016, 26 : 929 - 943
  • [2] Bayesian model-based clustering for longitudinal ordinal data
    Roy Costilla
    Ivy Liu
    Richard Arnold
    Daniel Fernández
    [J]. Computational Statistics, 2019, 34 : 1015 - 1038
  • [3] Model-based co-clustering for ordinal data
    Jacques, Julien
    Biernacki, Christophe
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2018, 123 : 101 - 115
  • [4] Bayesian model-based clustering for longitudinal ordinal data
    Costilla, Roy
    Liu, Ivy
    Arnold, Richard
    Fernandez, Daniel
    [J]. COMPUTATIONAL STATISTICS, 2019, 34 (03) : 1015 - 1038
  • [5] Model-based clustering for multivariate functional data
    Jacques, Julien
    Preda, Cristian
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2014, 71 : 92 - 106
  • [6] A Model-Based Multivariate Time Series Clustering Algorithm
    Zhou, Pei-Yuan
    Chan, Keith C. C.
    [J]. TRENDS AND APPLICATIONS IN KNOWLEDGE DISCOVERY AND DATA MINING, 2014, 8643 : 805 - 817
  • [7] Model-based clustering for multivariate partial ranking data
    Jacques, Julien
    Biernacki, Christophe
    [J]. JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2014, 149 : 201 - 217
  • [8] Probabilistic model-based clustering of multivariate and sequential data
    Smyth, P
    [J]. ARTIFICIAL INTELLIGENCE AND STATISTICS 99, PROCEEDINGS, 1999, : 299 - 304
  • [9] A Model-Based Approach to Simultaneous Clustering and Dimensional Reduction of Ordinal Data
    Ranalli, Monia
    Rocci, Roberto
    [J]. PSYCHOMETRIKA, 2017, 82 (04) : 1007 - 1034
  • [10] A Model-Based Approach to Simultaneous Clustering and Dimensional Reduction of Ordinal Data
    Monia Ranalli
    Roberto Rocci
    [J]. Psychometrika, 2017, 82 : 1007 - 1034