Dirichlet Process Mixture Models with Pairwise Constraints for Data Clustering

被引:0
|
作者
Li C. [1 ]
Rana S. [1 ]
Phung D. [1 ]
Venkatesh S. [1 ]
机构
[1] Centre for Pattern Recognition and Data Analytics, Deakin University, Geelong
关键词
Bayesian nonparametric; Constrained clustering; Dirichlet process; Mixture models; Pairwise constraints; Short-text clustering;
D O I
10.1007/s40745-016-0082-z
中图分类号
学科分类号
摘要
The Dirichlet process mixture (DPM) model, a typical Bayesian nonparametric model, can infer the number of clusters automatically, and thus performing priority in data clustering. This paper investigates the influence of pairwise constraints in the DPM model. The pairwise constraint, known as two types: must-link (ML) and cannot-link (CL) constraints, indicates the relationship between two data points. We have proposed two relevant models which incorporate pairwise constraints: the constrained DPM (C-DPM) and the constrained DPM with selected constraints (SC-DPM). In C-DPM, the concept of chunklet is introduced. ML constraints are compiled into chunklets and CL constraints exist between chunklets. We derive the Gibbs sampling of the C-DPM based on chunklets. We further propose a principled approach to select the most useful constraints, which will be incorporated into the SC-DPM. We evaluate the proposed models based on three real datasets: 20 Newsgroups dataset, NUS-WIDE image dataset and Facebook comments datasets we collected by ourselves. Our SC-DPM performs priority in data clustering. In addition, our SC-DPM can be potentially used for short-text clustering. © 2016, Springer-Verlag Berlin Heidelberg.
引用
收藏
页码:205 / 223
页数:18
相关论文
共 50 条
  • [1] Sampling in Dirichlet Process Mixture Models for Clustering Streaming Data
    Dinari, Or
    Freifeld, Oren
    [J]. INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151 : 818 - 835
  • [2] DIRICHLET PROCESS MIXTURE MODELS FOR CLUSTERING I-VECTOR DATA
    Seshadri, Shreyas
    Remes, Ulpu
    Rasanen, Okko
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5470 - 5474
  • [3] High Dimensional Data Clustering by means of Distributed Dirichlet Process Mixture Models
    Meguelati, Khadidja
    Fontez, Benedicte
    Hilgert, Nadine
    Masseglia, Florent
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 890 - 899
  • [4] Bayesian curve fitting and clustering with Dirichlet process mixture models for microarray data
    Ju-Hyun Park
    Minjung Kyung
    [J]. Journal of the Korean Statistical Society, 2019, 48 : 207 - 220
  • [5] Bayesian curve fitting and clustering with Dirichlet process mixture models for microarray data
    Park, Ju-Hyun
    Kyung, Minjung
    [J]. JOURNAL OF THE KOREAN STATISTICAL SOCIETY, 2019, 48 (02) : 207 - 220
  • [6] Clustering and unconstrained ordination with Dirichlet process mixture models
    Stratton, Christian
    Hoegh, Andrew
    Rodhouse, Thomas J.
    Green, Jennifer L.
    Banner, Katharine M.
    Irvine, Kathryn M.
    [J]. METHODS IN ECOLOGY AND EVOLUTION, 2024,
  • [7] Axially Symmetric Data Clustering Through Dirichlet Process Mixture Models of Watson Distributions
    Fan, Wentao
    Bouguila, Nizar
    Du, Ji-Xiang
    Liu, Xin
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (06) : 1683 - 1694
  • [8] DIRICHLET PROCESS MIXTURE MODELS FOR TIME-DEPENDENT CLUSTERING
    Yu, Kezi
    Djuric, Petar M.
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 4383 - 4387
  • [9] Variable selection in clustering via Dirichlet process mixture models
    Kim, Sinae
    Tadesse, Mahlet G.
    Vannucci, Marina
    [J]. BIOMETRIKA, 2006, 93 (04) : 877 - 893
  • [10] Dirichlet process mixture models for insurance loss data
    Hong, Liang
    Martin, Ryan
    [J]. SCANDINAVIAN ACTUARIAL JOURNAL, 2018, (06) : 545 - 554