Model-based clustering of high-dimensional data streams with online mixture of probabilistic PCA

被引:0
|
作者
Anastasios Bellas
Charles Bouveyron
Marie Cottrell
Jérôme Lacaille
机构
[1] SAMM (EA 4543),
[2] Université Paris 1,undefined
[3] Snecma,undefined
[4] Groupe Safran,undefined
关键词
Model-based clustering; Mixture of probabilistic PCA ; Data streams; High-dimensional data; Online inference; 62; 62-07; 62H25; 62H30;
D O I
暂无
中图分类号
学科分类号
摘要
Model-based clustering is a popular tool which is renowned for its probabilistic foundations and its flexibility. However, model-based clustering techniques usually perform poorly when dealing with high-dimensional data streams, which are nowadays a frequent data type. To overcome this limitation of model-based clustering, we propose an online inference algorithm for the mixture of probabilistic PCA model. The proposed algorithm relies on an EM-based procedure and on a probabilistic and incremental version of PCA. Model selection is also considered in the online setting through parallel computing. Numerical experiments on simulated and real data demonstrate the effectiveness of our approach and compare it to state-of-the-art online EM-based algorithms.
引用
收藏
页码:281 / 300
页数:19
相关论文
共 50 条
  • [1] Model-based clustering of high-dimensional data streams with online mixture of probabilistic PCA
    Bellas, Anastasios
    Bouveyron, Charles
    Cottrell, Marie
    Lacaille, Jerome
    [J]. ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2013, 7 (03) : 281 - 300
  • [2] Model-based clustering of high-dimensional data: A review
    Bouveyron, Charles
    Brunet-Saumard, Camille
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2014, 71 : 52 - 78
  • [3] MODEL-BASED CLUSTERING OF HIGH-DIMENSIONAL DATA IN ASTROPHYSICS
    Bouveyron, C.
    [J]. STATISTICS FOR ASTROPHYSICS: CLUSTERING AND CLASSIFICATION, 2016, 77 : 91 - 119
  • [4] Model-based regression clustering for high-dimensional data: application to functional data
    Devijver, Emilie
    [J]. ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2017, 11 (02) : 243 - 279
  • [5] Model-based clustering of high-dimensional longitudinal data via regularization
    Yang, Luoying
    Wu, Tong Tong
    [J]. BIOMETRICS, 2023, 79 (02) : 761 - 774
  • [6] Variable selection for model-based high-dimensional clustering
    Wang, Sijian
    Zhu, Ji
    [J]. PREDICTION AND DISCOVERY, 2007, 443 : 177 - +
  • [7] A Hierarchical Gamma Mixture Model-Based Method for Classification of High-Dimensional Data
    Azhar, Muhammad
    Li, Mark Junjie
    Huang, Joshua Zhexue
    [J]. ENTROPY, 2019, 21 (09)
  • [8] A Hierarchical Model-based Approach to Co-Clustering High-Dimensional Data
    Costa, Gianni
    Manco, Giuseppe
    Ortale, Riccardo
    [J]. APPLIED COMPUTING 2008, VOLS 1-3, 2008, : 886 - 890
  • [9] A grid-based clustering algorithm for high-dimensional data streams
    Lu, YS
    Sun, YF
    Xu, GP
    Liu, G
    [J]. ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2005, 3584 : 824 - 831
  • [10] An entropy weighting mixture model for subspace clustering of high-dimensional data
    Peng, Liuqing
    Zhang, Junying
    [J]. PATTERN RECOGNITION LETTERS, 2011, 32 (08) : 1154 - 1161