A Survey on Model-Based Co-Clustering: High Dimension and Estimation Challenges

被引:0
|
作者
Biernacki, C. [1 ]
Jacques, J. [2 ]
Keribin, C. [3 ]
机构
[1] Univ Lille, Inria, CNRS, Lab Math Painleve, F-59650 Villeneuve Dascq, France
[2] Univ Lyon, Lyon 2, ERIC UR 3083, 5 Ave Pierre Mendes France, F-69676 Bron, France
[3] Univ Paris Saclay, CNRS, Inria, Lab Math Orsay, F-91405 Orsay, France
关键词
High-dimension clustering; Mixture models; EM-like algorithms; Model selection; Mixed data types; MAXIMUM-LIKELIHOOD-ESTIMATION; UNIVARIATE GAUSSIAN MIXTURES; LATENT BLOCK MODEL; VARIABLE SELECTION; ASYMPTOTIC NORMALITY; EM ALGORITHM; CONSISTENCY; DEGENERACY; DENSITY;
D O I
10.1007/s00357-023-09441-3
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Model-based co-clustering can be seen as a particularly important extension of model-based clustering. It allows for a significant reduction of both the number of rows (individuals) and columns (variables) of a data set in a parsimonious manner, and also allows interpretability of the resulting reduced data set since the meaning of the initial individuals and features is preserved. Moreover, it benefits from the rich statistical theory for both estimation and model selection. Many works have produced new advances on this topic in recent years, and this paper offers a general update of the related literature. In addition, we advocate two main messages, supported by specific research material: (1) co-clustering requires further research to fix some well-identified estimation issues, and (2) co-clustering is one of the most promising approaches for clustering in the (very) high-dimensional setting, which corresponds to the global trend in modern data sets.
引用
收藏
页码:332 / 381
页数:50
相关论文
共 50 条
  • [1] A Survey on Model-Based Co-Clustering: High Dimension and Estimation Challenges
    C. Biernacki
    J. Jacques
    C. Keribin
    [J]. Journal of Classification, 2023, 40 : 332 - 381
  • [2] Model-based Co-clustering for High Dimensional Sparse Data
    Salah, Aghiles
    Rogovschi, Nicoleta
    Nadif, Mohamed
    [J]. ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 51, 2016, 51 : 866 - 874
  • [3] Model-based co-clustering for ordinal data
    Jacques, Julien
    Biernacki, Christophe
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2018, 123 : 101 - 115
  • [4] Model-based co-clustering for functional data
    Ben Slimen, Yosra
    Allio, Sylvain
    Jacques, Julien
    [J]. NEUROCOMPUTING, 2018, 291 : 97 - 108
  • [5] Model-based co-clustering for mixed type data
    Selosse, Margot
    Jacques, Julien
    Biernacki, Christophe
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2020, 144
  • [6] Model-based Poisson co-clustering for Attributed Networks
    Riverain, Paul
    Fossier, Simon
    Nadif, Mohamed
    [J]. 21ST IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS ICDMW 2021, 2021, : 703 - 710
  • [7] blockcluster: An R Package for Model-Based Co-Clustering
    Bhatia, Parmeet Singh
    Iovleff, Serge
    Govaert, Gerard
    [J]. JOURNAL OF STATISTICAL SOFTWARE, 2017, 76 (09): : 1 - 24
  • [8] A Hierarchical Model-based Approach to Co-Clustering High-Dimensional Data
    Costa, Gianni
    Manco, Giuseppe
    Ortale, Riccardo
    [J]. APPLIED COMPUTING 2008, VOLS 1-3, 2008, : 886 - 890
  • [9] Co-clustering contaminated data: a robust model-based approach
    Edoardo Fibbi
    Domenico Perrotta
    Francesca Torti
    Stefan Van Aelst
    Tim Verdonck
    [J]. Advances in Data Analysis and Classification, 2024, 18 : 121 - 161
  • [10] Co-clustering contaminated data: a robust model-based approach
    Fibbi, Edoardo
    Perrotta, Domenico
    Torti, Francesca
    Van Aelst, Stefan
    Verdonck, Tim
    [J]. ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2024, 18 (01) : 121 - 161