NONEXCHANGEABLE RANDOM PARTITION MODELS FOR MICROCLUSTERING

被引:2
|
作者
Di Benedetto, Giuseppe [1 ]
Caron, Francois [1 ]
Teh, Yee Whye [1 ]
机构
[1] Univ Oxford, Dept Stat, Oxford, England
来源
ANNALS OF STATISTICS | 2021年 / 49卷 / 04期
基金
欧盟第七框架计划; 英国工程与自然科学研究理事会;
关键词
Power-law; random partitions; completely random measure; stochastic process; sparse random graph; NORMALIZED RANDOM MEASURES; PRIORS;
D O I
10.1214/20-AOS2003
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Many popular random partition models, such as the Chinese restaurant process and its two-parameter extension, fall in the class of exchangeable random partitions, and have found wide applicability in various fields. While the exchangeability assumption is sensible in many cases, it implies that the size of the clusters necessarily grows linearly with the sample size, and such feature may be undesirable for some applications. We present here a flexible class of nonexchangeable random partition models, which are able to generate partitions whose cluster sizes grow sublinearly with the sample size, and where the growth rate is controlled by one parameter. Along with this result, we provide the asymptotic behaviour of the number of clusters of a given size, and show that the model can exhibit a power-law behaviour, controlled by another parameter. The construction is based on completely random measures and a Poisson embedding of the random partition, and inference is performed using a Sequential Monte Carlo algorithm. Experiments on real data sets emphasise the usefulness of the approach compared to a two-parameter Chinese restaurant process.
引用
收藏
页码:1931 / 1957
页数:27
相关论文
共 50 条
  • [1] Random Partition Models for Microclustering Tasks
    Betancourt, Brenda
    Zanella, Giacomo
    Steorts, Rebecca C.
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2022, 117 (539) : 1215 - 1227
  • [2] Differentiable Random Partition Models
    Sutter, Thomas M.
    Ryser, Alain
    Liebeskind, Joram
    Vogt, Julia E.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [3] Random partition models with regression on covariates
    Muellner, Peter
    Quintana, Fernando
    JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2010, 140 (10) : 2801 - 2808
  • [4] Noncrossing partition flow and random matrix models
    Pernici, Mario
    arXiv, 2021,
  • [5] Similarity analysis in Bayesian random partition models
    Navarrete, Carlos A.
    Quintana, Fernando A.
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2011, 55 (01) : 97 - 109
  • [6] Flexible Models for Microclustering with Application to Entity Resolution
    Zanella, Giacomo
    Betancourt, Brenda
    Wallach, Hanna
    Miller, Jeffrey
    Zaidi, Abbas
    Steorts, Rebecca C.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [7] Factorisations for partition functions of random Hermitian matrix models
    Jackson, DM
    Perry, MJ
    Visentin, TI
    COMMUNICATIONS IN MATHEMATICAL PHYSICS, 1996, 179 (01) : 25 - 59
  • [8] Nonparametric Bayes local partition models for random effects
    Dunson, David B.
    BIOMETRIKA, 2009, 96 (02) : 249 - 262
  • [9] Why the Rich Get Richer? On the Balancedness of Random Partition Models
    Lee, Changwoo J.
    Sang, Huiyan
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [10] Random Partition Models and Exchangeability for Bayesian Identification of Population Structure
    Jukka Corander
    Mats Gyllenberg
    Timo Koski
    Bulletin of Mathematical Biology, 2007, 69 : 797 - 815