Robust simultaneous positive data clustering and unsupervised feature selection using generalized inverted Dirichlet mixture models

被引:31
|
作者
Al Mashrgy, Mohamed [1 ]
Bdiri, Taoufik [1 ]
Bouguila, Nizar [2 ]
机构
[1] Concordia Univ, Dept Elect & Comp Engn, Montreal, PQ H3G 1T7, Canada
[2] Concordia Univ, CIISE, Montreal, PQ H3G 1T7, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Positive data; Generalized inverted Dirichlet; Finite mixture; Feature selection; Outliers; Model selection; Images clustering; VARIABLE SELECTION; REGRESSION;
D O I
10.1016/j.knosys.2014.01.007
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The discovery, extraction and analysis of knowledge from data rely generally upon the use of unsupervised learning methods, in particular clustering approaches. Much recent research in clustering and data engineering has focused on the consideration of finite mixture models which allow to reason in the face of uncertainty and to learn by example. The adoption of these models becomes a challenging task in the presence of outliers and in the case of high-dimensional data which necessitates the deployment of feature selection techniques. In this paper we tackle simultaneously the problems of cluster validation (i.e. model selection), feature selection and outliers rejection when clustering positive data. The proposed statistical framework is based on the generalized inverted Dirichlet distribution that offers a more practical and flexible alternative to the inverted Dirichlet which has a very restrictive covariance structure. The learning of the parameters of the resulting model is based on the minimization of a message length objective incorporating prior knowledge. We use synthetic data and real data generated from challenging applications, namely visual scenes and objects clustering, to demonstrate the feasibility and advantages of the proposed method. (C) 2014 Elsevier B.V. All rights reserved.
引用
收藏
页码:182 / 195
页数:14
相关论文
共 50 条
  • [21] Manifold Regularized Robust Unsupervised Feature Selection for Image Clustering
    Shi, Yuqing
    Du, Shiqiang
    [J]. PROCEEDINGS OF THE 36TH CHINESE CONTROL CONFERENCE (CCC 2017), 2017, : 11161 - 11165
  • [22] Simultaneous clustering and feature selection via nonparametric Pitman–Yor process mixture models
    Wentao Fan
    Nizar Bouguila
    [J]. International Journal of Machine Learning and Cybernetics, 2019, 10 : 2753 - 2766
  • [23] An Infinite Mixture Model of Generalized Inverted Dirichlet Distributions for High-Dimensional Positive Data Modeling
    Bouguila, Nizar
    Al Mashrgy, Mohamed
    [J]. INFORMATION AND COMMUNICATION TECHNOLOGY, 2014, 8407 : 296 - 305
  • [24] Unsupervised learning of Dirichlet process mixture models with missing data
    Xunan ZHANG
    Shiji SONG
    Lei ZHU
    Keyou YOU
    Cheng WU
    [J]. Science China(Information Sciences), 2016, 59 (01) : 161 - 174
  • [25] Unsupervised learning of Dirichlet process mixture models with missing data
    Zhang, Xunan
    Song, Shiji
    Zhu, Lei
    You, Keyou
    Wu, Cheng
    [J]. SCIENCE CHINA-INFORMATION SCIENCES, 2016, 59 (01) : 1 - 14
  • [26] Variable selection in clustering via Dirichlet process mixture models
    Kim, Sinae
    Tadesse, Mahlet G.
    Vannucci, Marina
    [J]. BIOMETRIKA, 2006, 93 (04) : 877 - 893
  • [27] Nonparametric Localized Feature Selection via a Dirichlet Process Mixture of Generalized Dirichlet Distributions
    Fan, Wentao
    Bouguila, Nizar
    [J]. NEURAL INFORMATION PROCESSING, ICONIP 2012, PT III, 2012, 7665 : 25 - 33
  • [28] Laplacian regularized generalized Dirichlet mixture distribution for data clustering
    Li, Baohua
    Hu, Lixia
    [J]. COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2020, 49 (01) : 16 - 28
  • [29] Data Clustering using Variational Learning of Finite Scaled Dirichlet Mixture Models
    Hieu Nguyen
    Azam, Muhammad
    Bouguila, Nizar
    [J]. 2019 IEEE 28TH INTERNATIONAL SYMPOSIUM ON INDUSTRIAL ELECTRONICS (ISIE), 2019, : 1391 - 1396
  • [30] Dirichlet process mixture models for unsupervised clustering of symptoms in Parkinson's disease
    White, Nicole
    Johnson, Helen
    Silburn, Peter
    Mengersen, Kerrie
    [J]. JOURNAL OF APPLIED STATISTICS, 2012, 39 (11) : 2363 - 2377