Bayesian Non-Parametric Clustering of Ranking Data

被引:4
|
作者
Meila, Marina [1 ]
Chen, Harr [2 ]
机构
[1] Univ Washington, Stat, Seattle, WA 98195 USA
[2] Vat Labs, San Francisco, CA USA
基金
美国国家科学基金会;
关键词
Rank data; top-t rankings; generalized Mallows model; Dirichlet process mixture; non-parametric clustering; DIRICHLET; INFERENCE;
D O I
10.1109/TPAMI.2016.2515599
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper studies the estimation of Dirichlet process mixtures over discrete incomplete rankings. The generative model for each mixture component is the generalized Mallows (GM) model, an exponential family model for permutations which extends seamlessly to top-t rankings. While the GM is remarkably tractable in comparison with other permutation models, its conjugate prior is not. Our main contribution is to derive the theory and algorithms for sampling from the desired posterior distributions under this DPM. We introduce a family of partially collapsed Gibbs samplers, containing as one extreme point an exact algorithm based on slicesampling, and at the other a fast approximate sampler with superior mixing that is still very accurate in all but the lowest ranks. We empirically demonstrate the effectiveness of the approximation in reducing mixing time, the benefits of the Dirichlet process approach over alternative clustering techniques, and the applicability of the approach to exploring large real-world ranking datasets.
引用
收藏
页码:2156 / 2169
页数:14
相关论文
共 50 条
  • [1] A NON-PARAMETRIC BAYESIAN CLUSTERING FOR GENE EXPRESSION DATA
    Wang, Liming
    Wang, Xiaodong
    [J]. 2012 IEEE STATISTICAL SIGNAL PROCESSING WORKSHOP (SSP), 2012, : 556 - 559
  • [2] Bayesian Non-Parametric Parsimonious Gaussian Mixture for Clustering
    Chamroukhi, Faicel
    Bartcus, Marius
    Glotin, Herve
    [J]. 2014 22ND INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2014, : 1460 - 1465
  • [3] A non-parametric Bayesian model for bounded data
    Thanh Minh Nguyen
    Wu, Q. M. Jonathan
    [J]. PATTERN RECOGNITION, 2015, 48 (06) : 2084 - 2095
  • [4] A Bayesian non-parametric approach for automatic clustering with feature weighting
    Paul, Debolina
    Das, Swagatam
    [J]. STAT, 2020, 9 (01):
  • [5] Unsupervised Clustering of Utterances using Non-parametric Bayesian Methods
    Higashinaka, Ryuichiro
    Kawamae, Noriaki
    Sadamitsu, Kugatsu
    Minami, Yasuhiro
    Meguro, Toyomi
    Dohsaka, Kohji
    Inagaki, Hirohito
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2092 - 2095
  • [6] Non-parametric Bayesian analysis of clustered survival data
    Lee, Jaeyong
    [J]. STATISTICS, 2008, 42 (06) : 515 - 526
  • [7] Sequential Embedding Induced Text Clustering, a Non-parametric Bayesian Approach
    Duan, Tiehang
    Lou, Qi
    Srihari, Sargur N.
    Xie, Xiaohui
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2019, PT III, 2019, 11441 : 68 - 80
  • [8] RARE JAROSITE DETECTION IN CRISM IMAGERY BY NON-PARAMETRIC BAYESIAN CLUSTERING
    Dundar, Murat
    Ehlmann, Bethany L.
    [J]. 2016 8TH WORKSHOP ON HYPERSPECTRAL IMAGE AND SIGNAL PROCESSING: EVOLUTION IN REMOTE SENSING (WHISPERS), 2016,
  • [9] A non-parametric Bayesian approach for clustering and tracking non-stationarities of neural spikes
    Shalchyan, Vahid
    Farina, Dario
    [J]. JOURNAL OF NEUROSCIENCE METHODS, 2014, 223 : 85 - 91
  • [10] Bayesian inference for longitudinal data with non-parametric treatment effects
    Mueller, Peter
    Quintana, Fernando A.
    Rosner, Gary L.
    Maitland, Michael L.
    [J]. BIOSTATISTICS, 2014, 15 (02) : 341 - 352