Dirichlet Process Mixture Models with Pairwise Constraints for Data Clustering

被引:0
|
作者
Li C. [1 ]
Rana S. [1 ]
Phung D. [1 ]
Venkatesh S. [1 ]
机构
[1] Centre for Pattern Recognition and Data Analytics, Deakin University, Geelong
关键词
Bayesian nonparametric; Constrained clustering; Dirichlet process; Mixture models; Pairwise constraints; Short-text clustering;
D O I
10.1007/s40745-016-0082-z
中图分类号
学科分类号
摘要
The Dirichlet process mixture (DPM) model, a typical Bayesian nonparametric model, can infer the number of clusters automatically, and thus performing priority in data clustering. This paper investigates the influence of pairwise constraints in the DPM model. The pairwise constraint, known as two types: must-link (ML) and cannot-link (CL) constraints, indicates the relationship between two data points. We have proposed two relevant models which incorporate pairwise constraints: the constrained DPM (C-DPM) and the constrained DPM with selected constraints (SC-DPM). In C-DPM, the concept of chunklet is introduced. ML constraints are compiled into chunklets and CL constraints exist between chunklets. We derive the Gibbs sampling of the C-DPM based on chunklets. We further propose a principled approach to select the most useful constraints, which will be incorporated into the SC-DPM. We evaluate the proposed models based on three real datasets: 20 Newsgroups dataset, NUS-WIDE image dataset and Facebook comments datasets we collected by ourselves. Our SC-DPM performs priority in data clustering. In addition, our SC-DPM can be potentially used for short-text clustering. © 2016, Springer-Verlag Berlin Heidelberg.
引用
收藏
页码:205 / 223
页数:18
相关论文
共 50 条
  • [41] Hybrid Dirichlet mixture models for functional data
    Petrone, Sonia
    Guindani, Michele
    Gelfand, Alan E.
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2009, 71 : 755 - 782
  • [42] A Dirichlet Process Mixture Model for Spherical Data
    Straub, Julian
    Chang, Jason
    Freifeld, Oren
    Fisher, John W., III
    [J]. ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 38, 2015, 38 : 930 - 938
  • [43] Data Clustering Using Variational Learning of Finite Scaled Dirichlet Mixture Models with Component Splitting
    Hieu Nguyen
    Maanicshah, Kamal
    Azam, Muhammad
    Bouguila, Nizar
    [J]. IMAGE ANALYSIS AND RECOGNITION (ICIAR 2019), PT II, 2019, 11663 : 117 - 128
  • [44] Document clustering with pairwise constraints
    Kreesuradej, W
    Suwanlamai, A
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2006, 20 (02) : 241 - 254
  • [45] Dirichlet Process Mixture Model for Document Clustering with Feature Partition
    Huang, Ruizhang
    Yu, Guan
    Wang, Zhaojun
    Zhang, Jun
    Shi, Liangxing
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2013, 25 (08) : 1748 - 1759
  • [46] THE DIRICHLET LABELING PROCESS FOR CLUSTERING FUNCTIONAL DATA
    XuanLong Nguyen
    Gelfand, Alan E.
    [J]. STATISTICA SINICA, 2011, 21 (03) : 1249 - 1289
  • [47] The nested joint clustering via Dirichlet process mixture model
    Han, Shengtong
    Zhang, Hongmei
    Sheng, Wenhui
    Arshad, Hasan
    [J]. JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2019, 89 (05) : 815 - 830
  • [48] Mixture models for ordinal data: a pairwise likelihood approach
    Ranalli, Monia
    Rocci, Roberto
    [J]. STATISTICS AND COMPUTING, 2016, 26 (1-2) : 529 - 547
  • [49] Mixture models for ordinal data: a pairwise likelihood approach
    Monia Ranalli
    Roberto Rocci
    [J]. Statistics and Computing, 2016, 26 : 529 - 547
  • [50] A Dirichlet Process Mixture of Generalized Dirichlet Distributions for Proportional Data Modeling
    Bouguila, Nizar
    Ziou, Djemel
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS, 2010, 21 (01): : 107 - 122