Mixture of Networks for Clustering Categorical Data: A Penalized Composite Likelihood Approach

被引:0
|
作者
Baek, Jangsun [1 ]
Park, Jeong-Soo [1 ]
机构
[1] Chonnam Natl Univ, Dept Stat, Gwangju, South Korea
来源
AMERICAN STATISTICIAN | 2023年 / 77卷 / 03期
基金
新加坡国家研究基金会;
关键词
Categorical data; Model-based clustering; Networks; Penalized composite likelihood; K-MEANS ALGORITHM; DISCRIMINANT-ANALYSIS; MAXIMUM-LIKELIHOOD; MODEL SELECTION; LATENT; ANALYZERS;
D O I
10.1080/00031305.2022.2141856
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
One of the challenges in clustering categorical data is the curse of dimensionality caused by the inherent sparsity of high-dimensional data, the records of which include a large number of attributes. The latent class model (LCM) assumes local independence between the variables in clusters, and is a parsimonious model-based clustering approach that has been used to circumvent the problem. The mixture of a log-linear model is more flexible but requires more parameters to be estimated. In this research, we recognize that each categorical observation can be conceived as a network with pairwise linked nodes, which are the response levels of the observation attributes. Therefore, the categorical data for clustering is considered a finite mixture of different component layer networks with distinct patterns. We apply a penalized composite likelihood approach to a finite mixture of networks for sparse multivariate categorical data to reduce the number of parameters, implement the EM algorithm to estimate the model parameters, and show that the estimates are consistent and satisfy asymptotic normality. The performance of the proposed approach is shown to be better in comparison with the conventional methods for both synthetic and real datasets.
引用
下载
收藏
页码:259 / 273
页数:15
相关论文
共 50 条
  • [41] Rough set approach for categorical data clustering1
    Herawan, Tutut
    Ghazali, Rozaida
    Yanto, Iwan Tri Riyadi
    Deris, Mustafa Mat
    International Journal of Database Theory and Application, 2010, 3 (01): : 33 - 52
  • [42] Feature selection for clustering categorical data with an embedded modelling approach
    Silvestre, Claudia
    Cardoso, Margarida G. M. S.
    Figueiredo, Mario
    EXPERT SYSTEMS, 2015, 32 (03) : 444 - 453
  • [43] A Genetic Algorithm Based Ensemble Approach for Categorical Data Clustering
    Goswami, Jyoti Prokash
    Mahanta, Anjana Kakoti
    2015 ANNUAL IEEE INDIA CONFERENCE (INDICON), 2015,
  • [44] A multicluster approach to selecting initial sets for clustering of categorical data
    Santos-Mangudo C.
    Heras A.J.
    Santos-Mangudo, Carlos (casant01@ucm.es), 2020, Informing Science Institute (15) : 227 - 246
  • [45] Clustering categorical data based on the relational analysis approach and MapReduce
    Lamari Y.
    Slaoui S.C.
    Journal of Big Data, 2017, 4 (01)
  • [46] A mixture model approach for binned data clustering
    Samé, A
    Ambroise, C
    Govaert, G
    ADVANCES IN INTELLIGENT DATA ANALYSIS V, 2003, 2810 : 265 - 274
  • [47] Penalized Likelihood Approach to Covariance Matrix Estimation From Data With Cell Outliers
    Stoica, Petre
    Babu, Prabhu
    IEEE Transactions on Signal Processing, 2024, 72 : 5616 - 5627
  • [48] Penalized maximum likelihood estimation for univariate normal mixture distributions
    Ridolfi, A
    Idier, J
    BAYESIAN INFERENCE AND MAXIMUM ENTROPY METHODS IN SCIENCE AND ENGINEERING, PT 2, 2001, 568 : 229 - 237
  • [49] A composite likelihood approach to binary spatial data
    Heagerty, PJ
    Lele, SR
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1998, 93 (443) : 1099 - 1111
  • [50] A composite likelihood approach to multivariate survival data
    Parner, ET
    SCANDINAVIAN JOURNAL OF STATISTICS, 2001, 28 (02) : 295 - 302