A Hierarchical Gamma Mixture Model-Based Method for Classification of High-Dimensional Data

被引:0
|
作者
Azhar, Muhammad [1 ]
Li, Mark Junjie [1 ]
Huang, Joshua Zhexue [1 ]
机构
[1] Shenzhen Univ, Coll Comp Sci & Software Engn, Shenzhen 518060, Peoples R China
关键词
data mining; unsupervised classification; decision cluster; gamma mixture model; expectation maximization; high-dimensional data; curse of dimensionality; SELECTION; INTERNET;
D O I
10.3390/e21090906
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
Data classification is an important research topic in the field of data mining. With the rapid development in social media sites and IoT devices, data have grown tremendously in volume and complexity, which has resulted in a lot of large and complex high-dimensional data. Classifying such high-dimensional complex data with a large number of classes has been a great challenge for current state-of-the-art methods. This paper presents a novel, hierarchical, gamma mixture model-based unsupervised method for classifying high-dimensional data with a large number of classes. In this method, we first partition the features of the dataset into feature strata by using k-means. Then, a set of subspace data sets is generated from the feature strata by using the stratified subspace sampling method. After that, the GMM Tree algorithm is used to identify the number of clusters and initial clusters in each subspace dataset and passing these initial cluster centers to k-means to generate base subspace clustering results. Then, the subspace clustering result is integrated into an object cluster association (OCA) matrix by using the link-based method. The ensemble clustering result is generated from the OCA matrix by the k-means algorithm with the number of clusters identified by the GMM Tree algorithm. After producing the ensemble clustering result, the dominant class label is assigned to each cluster after computing the purity. A classification is made on the object by computing the distance between the new object and the center of each cluster in the classifier, and the class label of the cluster is assigned to the new object which has the shortest distance. A series of experiments were conducted on twelve synthetic and eight real-world data sets, with different numbers of classes, features, and objects. The experimental results have shown that the new method outperforms other state-of-the-art techniques to classify data in most of the data sets.
引用
收藏
页数:21
相关论文
共 50 条
  • [1] A Hierarchical Model-based Approach to Co-Clustering High-Dimensional Data
    Costa, Gianni
    Manco, Giuseppe
    Ortale, Riccardo
    [J]. APPLIED COMPUTING 2008, VOLS 1-3, 2008, : 886 - 890
  • [2] A hierarchical Gamma Mixture Model-based method for estimating the number of clusters in complex data
    Azhar, Muhammad
    Huang, Joshua Zhexue
    Masud, Md Abdul
    Li, Mark Junjie
    Cui, Laizhong
    [J]. APPLIED SOFT COMPUTING, 2020, 87
  • [3] Model-based clustering of high-dimensional data streams with online mixture of probabilistic PCA
    Anastasios Bellas
    Charles Bouveyron
    Marie Cottrell
    Jérôme Lacaille
    [J]. Advances in Data Analysis and Classification, 2013, 7 : 281 - 300
  • [4] Model-based clustering of high-dimensional data streams with online mixture of probabilistic PCA
    Bellas, Anastasios
    Bouveyron, Charles
    Cottrell, Marie
    Lacaille, Jerome
    [J]. ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2013, 7 (03) : 281 - 300
  • [5] Hierarchical classification of microorganisms based on high-dimensional phenotypic data
    Tafintseva, Valeria
    Vigneau, Evelyne
    Shapaval, Volha
    Cariou, Veronique
    Qannari, El Mostafa
    Kohler, Achim
    [J]. JOURNAL OF BIOPHOTONICS, 2018, 11 (03)
  • [6] Model-based clustering of high-dimensional data: A review
    Bouveyron, Charles
    Brunet-Saumard, Camille
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2014, 71 : 52 - 78
  • [7] MODEL-BASED CLUSTERING OF HIGH-DIMENSIONAL DATA IN ASTROPHYSICS
    Bouveyron, C.
    [J]. STATISTICS FOR ASTROPHYSICS: CLUSTERING AND CLASSIFICATION, 2016, 77 : 91 - 119
  • [8] Ensemble Method for Classification of High-Dimensional Data
    Piao, Yongjun
    Park, Hyun Woo
    Jin, Cheng Hao
    Ryu, Keun Ho
    [J]. 2014 INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP), 2014, : 245 - +
  • [9] Model-based regression clustering for high-dimensional data: application to functional data
    Devijver, Emilie
    [J]. ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2017, 11 (02) : 243 - 279
  • [10] Model-based clustering of high-dimensional longitudinal data via regularization
    Yang, Luoying
    Wu, Tong Tong
    [J]. BIOMETRICS, 2023, 79 (02) : 761 - 774