Dirichlet compound negative multinomial mixture models and applications

被引:0
|
作者
Bregu, Ornela [1 ]
Bouguila, Nizar [1 ]
机构
[1] Concordia Univ, Concordia Inst Informat Syst Engn, Montreal, PQ, Canada
关键词
Count data; Exact fisher information matrix; Exponential approximation; Kullback-Leibler divergence; Agglomerative hierarchical clustering; Mixture models; EXPONENTIAL APPROXIMATION; CLASSIFICATION; ALGORITHMS; SELECTION;
D O I
10.1007/s11634-024-00598-2
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
In this paper, we consider an alternative parametrization of Dirichlet Compound Negative Multinomial (DCNM) using rising polynomials. The new parametrization gets rid of Gamma functions and allows us to derive the Exact Fisher Information Matrix, which brings significant improvements to model performance due to feature correlation consideration. Second, we propose to improve the computation efficiency by approximating the DCNM model as a member of the exponential family of distributions, called EDCNM. The novel EDCNM model brings several advantages as compared to the DCNM model, such as a closed-form solution for maximum likelihood estimation, higher efficiency due to computational time reduction for sparse datasets, etc. Third, we implement Agglomerative Hierarchical clustering, where Kullback-Leibler divergence is derived and used to measure the distance between two EDCNM probability distributions. Finally, we integrate the Minimum Message Length criterion in our algorithm to estimate the optimal number of components of the mixture model. The merits of our proposed models are validated via challenging real-world applications in Natural Language Processing and Image/Video Recognition. Results reveal that the exponential approximation of the DCNM model has reduced significantly the computational complexity in high-dimensional feature spaces.
引用
收藏
页数:36
相关论文
共 50 条
  • [31] A Dirichlet Multinomial Mixture Model-based Approach for Short Text Clustering
    Yin, Jianhua
    Wang, Jianyong
    [J]. PROCEEDINGS OF THE 20TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'14), 2014, : 233 - 242
  • [32] ChromDMM: a Dirichlet-multinomial mixture model for clustering heterogeneous epigenetic data
    Osmala, Maria
    Eraslan, Gokcen
    Lahdesmaki, Harri
    [J]. BIOINFORMATICS, 2022, 38 (16) : 3863 - 3870
  • [33] Multinomial N-mixture models for removal sampling
    Haines, Linda M.
    [J]. BIOMETRICS, 2020, 76 (02) : 540 - 548
  • [34] A mixture of logistic skew-normal multinomial models
    Tu, Wangshu
    Browne, Ryan
    Subedi, Sanjeena
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2024, 196
  • [35] Using multinomial mixture models to cluster Internet traffic
    Jorgensen, M
    [J]. AUSTRALIAN & NEW ZEALAND JOURNAL OF STATISTICS, 2004, 46 (02) : 205 - 218
  • [36] Compound Log-Series Distribution with Negative Multinomial Summands
    Jordanova, Pavlina
    Petkova, Monika P.
    Stehlik, Milan
    [J]. NUMERICAL ANALYSIS AND ITS APPLICATIONS (NAA 2016), 2017, 10187 : 383 - 390
  • [37] Eliciting Dirichlet and Connor-Mosimann prior distributions for multinomial models
    Elfadaly, Fadlalla G.
    Garthwaite, Paul H.
    [J]. TEST, 2013, 22 (04) : 628 - 646
  • [38] MONOTONICITY PROPERTIES OF DIRICHLET INTEGRALS WITH APPLICATIONS TO MULTINOMIAL DISTRIBUTION AND ANALYSIS OF VARIANCE
    OLKIN, I
    [J]. BIOMETRIKA, 1972, 59 (02) : 303 - 307
  • [39] Integrating semantic similarity with Dirichlet multinomial mixture model for enhanced web service clustering
    Neha Agarwal
    Geeta Sikka
    Lalit Kumar Awasthi
    [J]. Knowledge and Information Systems, 2024, 66 : 2327 - 2353
  • [40] Integrating semantic similarity with Dirichlet multinomial mixture model for enhanced web service clustering
    Agarwal, Neha
    Sikka, Geeta
    Awasthi, Lalit Kumar
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2024, 66 (04) : 2327 - 2353