Unsupervised nested Dirichlet finite mixture model for clustering

被引:2
|
作者
Alkhawaja, Fares [1 ]
Bouguila, Nizar [1 ]
机构
[1] Concordia Univ, Concordia Inst Informat Syst Engn, Montreal, PQ, Canada
关键词
Nested Dirichlet distribution; Dirichlet-tree distribution; Minimum message length; Finite mixtures; Hierarchical learning; GENERALIZED DIRICHLET; INFORMATION; FRAMEWORK;
D O I
10.1007/s10489-023-04888-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The Dirichlet distribution is widely used in the context of mixture models. Despite its flexibility, it still suffers from some limitations, such as its restrictive covariance matrix and its direct proportionality between its mean and variance. In this work, a generalization over the Dirichlet distribution, namely the Nested Dirichlet distribution, is introduced in the context of finite mixture model providing more flexibility and overcoming the mentioned drawbacks, thanks to its hierarchical structure. The model learning is based on the generalized expectation-maximization algorithm, where parameters are initialized with the method of moments and estimated through the iterative Newton-Raphson method. Moreover, the minimum message length criterion is proposed to determine the best number of components that describe the data clusters by the finite mixture model. The Nested Dirichlet distribution is proven to be part of the exponential family, which offers several advantages, such as the calculation of several probabilistic distances in closed forms. The performance of the Nested Dirichlet mixture model is compared to the Dirichlet mixture model, the generalized Dirichlet mixture model, and the Convolutional Neural Network as a deep learning network. The excellence of the powerful proposed framework is validated through this comparison via challenging datasets. The hierarchical feature of the model is applied to real-world challenging tasks such as hierarchical cluster analysis and hierarchical feature learning, showing a significant improvement in terms of accuracy.
引用
收藏
页码:25232 / 25258
页数:27
相关论文
共 50 条
  • [41] Object Clustering With Dirichlet Process Mixture Model for Data Association in Monocular SLAM
    Wei, Songlin
    Chen, Guodong
    Chi, Wenzheng
    Wang, Zhenhua
    Sun, Lining
    [J]. IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2023, 70 (01) : 594 - 603
  • [42] ChromDMM: a Dirichlet-multinomial mixture model for clustering heterogeneous epigenetic data
    Osmala, Maria
    Eraslan, Gokcen
    Lahdesmaki, Harri
    [J]. BIOINFORMATICS, 2022, 38 (16) : 3863 - 3870
  • [43] Speaker Clustering Based on Utterance-oriented Dirichlet Process Mixture Model
    Tawara, Naohiro
    Watanabe, Shinji
    Ogawa, Tetsuji
    Kobayashi, Tetsunori
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2916 - +
  • [44] Tensor Dirichlet Process Multinomial Mixture Model with Graphs for Passenger Trajectory Clustering
    Li, Ziyue
    Yan, Hao
    Zhang, Chen
    Ketter, Wolfgang
    Tsung, Fugee
    [J]. PROCEEDINGS OF THE 6TH ACM SIGSPATIAL INTERNATIONAL WORKSHOP ON AI FOR GEOGRAPHIC KNOWLEDGE DISCOVERY, GEOAI 2023, 2023, : 121 - 128
  • [45] A Dirichlet Multinomial Mixture Model-based Approach for Short Text Clustering
    Yin, Jianhua
    Wang, Jianyong
    [J]. PROCEEDINGS OF THE 20TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'14), 2014, : 233 - 242
  • [46] Simultaneous inference for multiple testing and clustering via a Dirichlet, process mixture model
    Dahl, David B.
    Mo, Qianxing
    Vannucci, Marina
    [J]. STATISTICAL MODELLING, 2008, 8 (01) : 23 - 39
  • [47] Robust simultaneous positive data clustering and unsupervised feature selection using generalized inverted Dirichlet mixture models
    Al Mashrgy, Mohamed
    Bdiri, Taoufik
    Bouguila, Nizar
    [J]. KNOWLEDGE-BASED SYSTEMS, 2014, 59 : 182 - 195
  • [48] Assessing Search and Unsupervised Clustering Algorithms in Nested Sampling
    Maillard, Lune
    Finocchi, Fabio
    Trassinelli, Martino
    [J]. ENTROPY, 2023, 25 (02)
  • [49] Nested Dolls: Towards Unsupervised Clustering of Web Tables
    Khan, Rituparna
    Gubanov, Michael
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 5357 - 5359
  • [50] Data Clustering Using Variational Learning of Finite Scaled Dirichlet Mixture Models with Component Splitting
    Hieu Nguyen
    Maanicshah, Kamal
    Azam, Muhammad
    Bouguila, Nizar
    [J]. IMAGE ANALYSIS AND RECOGNITION (ICIAR 2019), PT II, 2019, 11663 : 117 - 128