Mixture of Networks for Clustering Categorical Data: A Penalized Composite Likelihood Approach

被引:0
|
作者
Baek, Jangsun [1 ]
Park, Jeong-Soo [1 ]
机构
[1] Chonnam Natl Univ, Dept Stat, Gwangju, South Korea
来源
AMERICAN STATISTICIAN | 2023年 / 77卷 / 03期
基金
新加坡国家研究基金会;
关键词
Categorical data; Model-based clustering; Networks; Penalized composite likelihood; K-MEANS ALGORITHM; DISCRIMINANT-ANALYSIS; MAXIMUM-LIKELIHOOD; MODEL SELECTION; LATENT; ANALYZERS;
D O I
10.1080/00031305.2022.2141856
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
One of the challenges in clustering categorical data is the curse of dimensionality caused by the inherent sparsity of high-dimensional data, the records of which include a large number of attributes. The latent class model (LCM) assumes local independence between the variables in clusters, and is a parsimonious model-based clustering approach that has been used to circumvent the problem. The mixture of a log-linear model is more flexible but requires more parameters to be estimated. In this research, we recognize that each categorical observation can be conceived as a network with pairwise linked nodes, which are the response levels of the observation attributes. Therefore, the categorical data for clustering is considered a finite mixture of different component layer networks with distinct patterns. We apply a penalized composite likelihood approach to a finite mixture of networks for sparse multivariate categorical data to reduce the number of parameters, implement the EM algorithm to estimate the model parameters, and show that the estimates are consistent and satisfy asymptotic normality. The performance of the proposed approach is shown to be better in comparison with the conventional methods for both synthetic and real datasets.
引用
下载
收藏
页码:259 / 273
页数:15
相关论文
共 50 条
  • [31] A Mixture Model Approach for Clustering Bipartite Networks
    Gollini, Isabella
    CHALLENGES IN SOCIAL NETWORK RESEARCH: METHODS AND APPLICATIONS, 2020, : 79 - 91
  • [32] Penalized Whittle likelihood for spatial data
    Chen, Kun
    Chan, Ngai Hang
    Yau, Chun Yip
    Hu, Jie
    JOURNAL OF MULTIVARIATE ANALYSIS, 2023, 195
  • [33] A hybrid data transformation approach for privacy preserving clustering of categorical data
    Natarajan, A. M.
    Rajalaxmi, R. R.
    Uma, N.
    Kirubhakar, G.
    INNOVATIONS AND ADVANCED TECHNIQUES IN COMPUTER AND INFORMATION SCIENCES AND ENGINEERING, 2007, : 403 - 408
  • [34] Mixture of latent trait analyzers for model-based clustering of categorical data
    Isabella Gollini
    Thomas Brendan Murphy
    Statistics and Computing, 2014, 24 : 569 - 588
  • [35] Mixture of latent trait analyzers for model-based clustering of categorical data
    Gollini, Isabella
    Murphy, Thomas Brendan
    STATISTICS AND COMPUTING, 2014, 24 (04) : 569 - 588
  • [36] A rival penalized EM algorithm towards maximizing weighted likelihood for density mixture clustering with automatic model selection
    Cheung, YM
    PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 4, 2004, : 633 - 636
  • [37] An integer-coded evolutionary approach for mixture maximum likelihood clustering
    Tawfick, Mohamad M.
    Abbas, Hazem M.
    Shahein, Hussein I.
    PATTERN RECOGNITION LETTERS, 2008, 29 (04) : 515 - 524
  • [38] Penalized likelihood inference for the finite mixture of Poisson distributions from capture-recapture data
    Liu, Yang
    Kuang, Rong
    Liu, Guanfu
    STATISTICAL PAPERS, 2024, 65 (05) : 2751 - 2773
  • [39] Mixture models for ordinal data: a pairwise likelihood approach
    Ranalli, Monia
    Rocci, Roberto
    STATISTICS AND COMPUTING, 2016, 26 (1-2) : 529 - 547
  • [40] Mixture models for ordinal data: a pairwise likelihood approach
    Monia Ranalli
    Roberto Rocci
    Statistics and Computing, 2016, 26 : 529 - 547