Replica analysis of Bayesian data clustering

被引:0
|
作者
Mozeika, Alexander [1 ]
Coolen, Anthony C. C. [2 ,3 ]
机构
[1] Kings Coll London, Inst Math & Mol Biomed, Hodgkin Bldg, London SE1 1UL, England
[2] Kings Coll London, Dept Math, London WC2R 2LS, England
[3] London Inst Math Sci, 35A South St, London W1K 2XF, England
基金
英国医学研究理事会;
关键词
clustering; Bayesian inference; replica; STATISTICAL-MECHANICS; CLASSIFICATION;
D O I
10.1088/1751-8121/ab59af
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
We use statistical mechanics to study model-based Bayesian data clustering. In this approach, each partition of the data into clusters is regarded as a microscopic system state, the negative data log-likelihood gives the energy of each state, and the data set realisation acts as disorder. Optimal clustering corresponds to the ground state of the system, and is hence obtained from the free energy via a low 'temperature' limit. We assume that for large sample sizes the free energy density is self-averaging, and we use the replica method to compute the asymptotic free energy density. The main order parameter in the resulting (replica symmetric) theory, the distribution of the data over the clusters, satisfies a self-consistent equation which can be solved by a population dynamics algorithm. From this order parameter one computes the average free energy, and all relevant macroscopic characteristics of the problem. The theory describes numerical experiments perfectly, and gives a significant improvement over the mean-field theory that was used to study this model in past.
引用
收藏
页数:32
相关论文
共 50 条
  • [41] AutoClassWeb: a simple web interface for Bayesian clustering of omics data
    Poulain, Pierre
    Camadro, Jean-Michel
    BMC RESEARCH NOTES, 2022, 15 (01)
  • [42] Context-specific Bayesian clustering for gene expression data
    Barash, Y
    Friedman, N
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2002, 9 (02) : 169 - 191
  • [43] Bayesian modelling of tuberculosis clustering from DNA fingerprint data
    Scott, Allison N.
    Joseph, Lawrence
    Belisle, Patrick
    Behr, Marcel A.
    Schwartzman, Kevin
    STATISTICS IN MEDICINE, 2008, 27 (01) : 140 - 156
  • [44] Bayesian Semiparametric Local Clustering of Multiple Time Series Data
    Fan, Jingjing
    Sarkar, Abhra
    TECHNOMETRICS, 2024, 66 (02) : 282 - 294
  • [45] Bayesian model-based clustering for longitudinal ordinal data
    Costilla, Roy
    Liu, Ivy
    Arnold, Richard
    Fernandez, Daniel
    COMPUTATIONAL STATISTICS, 2019, 34 (03) : 1015 - 1038
  • [46] Data clustering using hidden variables in hybrid Bayesian networks
    Fernández A.
    Gámez J.A.
    Rumí R.
    Salmerón A.
    Fernández, Antonio, 1600, Springer Verlag (02): : 141 - 152
  • [47] BAYESIAN MODEL-BASED CLUSTERING FOR POPULATIONS OF NETWORK DATA
    Mantziou, Anastasia
    Lunagomez, Simon
    Mitra, Robin
    ANNALS OF APPLIED STATISTICS, 2024, 18 (01): : 266 - 302
  • [48] Bayesian variable selection in clustering high-dimensional data
    Tadesse, MG
    Sha, N
    Vannucci, M
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2005, 100 (470) : 602 - 617
  • [49] R/BHC: fast Bayesian hierarchical clustering for microarray data
    Savage, Richard S.
    Heller, Katherine
    Xu, Yang
    Ghahramani, Zoubin
    Truman, William M.
    Grant, Murray
    Denby, Katherine J.
    Wild, David L.
    BMC BIOINFORMATICS, 2009, 10
  • [50] A NON-PARAMETRIC BAYESIAN CLUSTERING FOR GENE EXPRESSION DATA
    Wang, Liming
    Wang, Xiaodong
    2012 IEEE STATISTICAL SIGNAL PROCESSING WORKSHOP (SSP), 2012, : 556 - 559