Model-based clustering of high-dimensional data streams with online mixture of probabilistic PCA

被引:0
|
作者
Anastasios Bellas
Charles Bouveyron
Marie Cottrell
Jérôme Lacaille
机构
[1] SAMM (EA 4543),
[2] Université Paris 1,undefined
[3] Snecma,undefined
[4] Groupe Safran,undefined
关键词
Model-based clustering; Mixture of probabilistic PCA ; Data streams; High-dimensional data; Online inference; 62; 62-07; 62H25; 62H30;
D O I
暂无
中图分类号
学科分类号
摘要
Model-based clustering is a popular tool which is renowned for its probabilistic foundations and its flexibility. However, model-based clustering techniques usually perform poorly when dealing with high-dimensional data streams, which are nowadays a frequent data type. To overcome this limitation of model-based clustering, we propose an online inference algorithm for the mixture of probabilistic PCA model. The proposed algorithm relies on an EM-based procedure and on a probabilistic and incremental version of PCA. Model selection is also considered in the online setting through parallel computing. Numerical experiments on simulated and real data demonstrate the effectiveness of our approach and compare it to state-of-the-art online EM-based algorithms.
引用
收藏
页码:281 / 300
页数:19
相关论文
共 50 条
  • [41] A PROBABILISTIC l1 METHOD FOR CLUSTERING HIGH-DIMENSIONAL DATA
    Asamov, Tsvetan
    Ben-Israel, Adi
    [J]. PROBABILITY IN THE ENGINEERING AND INFORMATIONAL SCIENCES, 2022, 36 (02) : 433 - 448
  • [42] High-dimensional data clustering
    Bouveyron, C.
    Girard, S.
    Schmid, C.
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2007, 52 (01) : 502 - 519
  • [43] Clustering High-Dimensional Data
    Masulli, Francesco
    Rovetta, Stefano
    [J]. CLUSTERING HIGH-DIMENSIONAL DATA, CHDD 2012, 2015, 7627 : 1 - 13
  • [44] Supervised clustering of high-dimensional data using regularized mixture modeling
    Chang, Wennan
    Wan, Changlin
    Zang, Yong
    Zhang, Chi
    Cao, Sha
    [J]. BRIEFINGS IN BIOINFORMATICS, 2021, 22 (04)
  • [45] Model-based Co-clustering for High Dimensional Sparse Data
    Salah, Aghiles
    Rogovschi, Nicoleta
    Nadif, Mohamed
    [J]. ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 51, 2016, 51 : 866 - 874
  • [46] Forecasting Simultaneously High-Dimensional Time Series: A Robust Model-Based Clustering Approach
    Wang, Yongning
    Tsay, Ruey S.
    Ledolter, Johannes
    Shrestha, Keshab M.
    [J]. JOURNAL OF FORECASTING, 2013, 32 (08) : 673 - 684
  • [47] Robust PCA for high-dimensional data based on characteristic transformation
    He, Lingyu
    Yang, Yanrong
    Zhang, Bo
    [J]. AUSTRALIAN & NEW ZEALAND JOURNAL OF STATISTICS, 2023, 65 (02) : 127 - 151
  • [48] Clustering algorithm of high-dimensional data based on units
    Xie, Kunwu
    Bi, Xiaoling
    Ye, Bin
    [J]. Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2007, 44 (09): : 1618 - 1623
  • [49] High-Dimensional Probabilistic Fingerprinting in Wireless Sensor Networks Based on a Multivariate Gaussian Mixture Model
    Li, Yan
    Williams, Simon
    Moran, Bill
    Kealy, Allison
    Retscher, Guenther
    [J]. SENSORS, 2018, 18 (08)
  • [50] Tiling and PCA Strategy for Clustering-Based High-Dimensional Gaussian Filtering
    Oishi S.
    Fukushima N.
    [J]. SN Computer Science, 5 (1)