HDclassif: An R Package for Model-Based Clustering and Discriminant Analysis of High-Dimensional Data

被引:0
|
作者
Berge, Laurent [1 ]
Bouveyron, Charles [2 ]
Girard, Stephane [3 ,4 ]
机构
[1] Univ Bordeaux IV, CNRS, UMR 5113, Lab GREThA, F-33608 Pessac, France
[2] Univ Paris 01, EA 4543, Lab SAMM, F-75013 Paris, France
[3] INRIA Rhone Alpes, Team Mistis, F-38330 Montbonnot St Martin, Saint Ismier, France
[4] LJK, F-38330 Montbonnot St Martin, Saint Ismier, France
来源
JOURNAL OF STATISTICAL SOFTWARE | 2012年 / 46卷 / 06期
关键词
model-based classification; high-dimensional data; discriminant analysis; clustering; Gaussian mixture models; parsimonious models; class-specific subspaces; R package; MAXIMUM-LIKELIHOOD; EM ALGORITHM; SELECTION; MIXTURES;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This paper presents the R package HDclassif which is devoted to the clustering and the discriminant analysis of high-dimensional data. The classification methods proposed in the package result from a new parametrization of the Gaussian mixture model which combines the idea of dimension reduction and model constraints on the covariance matrices. The supervised classification method using this parametrization is called high dimensional discriminant analysis (HDDA). In a similar manner, the associated clustering method is called high dimensional data clustering (HDDC) and uses the expectation-maximization algorithm for inference. In order to correctly fit the data, both methods estimate the specific subspace and the intrinsic dimension of the groups. Due to the constraints on the covariance matrices, the number of parameters to estimate is significantly lower than other model-based methods and this allows the methods to be stable and efficient in high dimensions. Two introductory examples illustrated with R codes allow the user to discover the hdda and hddc functions. Experiments on simulated and real datasets also compare HDDC and HDDA with existing classification methods on high-dimensional datasets. HDclassif is a free software and distributed under the general public license, as part of the R software project.
引用
收藏
页码:1 / 29
页数:29
相关论文
共 50 条
  • [41] Feature extraction and uncorrelated discriminant analysis for high-dimensional data
    Yang, Wen-Hui
    Dai, Dao-Qing
    Yan, Hong
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2008, 20 (05) : 601 - 614
  • [42] Generalized Linear Discriminant Analysis for High-Dimensional Genomic Data
    Li, Sisi
    Lewinger, Juan Pablo
    [J]. GENETIC EPIDEMIOLOGY, 2017, 41 (07) : 704 - 704
  • [43] Discriminant analysis of high-dimensional data over limited samples
    V. I. Serdobolskii
    [J]. Doklady Mathematics, 2010, 81 : 75 - 77
  • [44] High-dimensional integrative copula discriminant analysis for multiomics data
    He, Yong
    Chen, Hao
    Sun, Hao
    Ji, Jiadong
    Shi, Yufeng
    Zhang, Xinsheng
    Liu, Lei
    [J]. STATISTICS IN MEDICINE, 2020, 39 (30) : 4869 - 4884
  • [45] Optimal Linear Discriminant Analysis for High-Dimensional Functional Data
    Xue, Kaijie
    Yang, Jin
    Yao, Fang
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2024, 119 (546) : 1055 - 1064
  • [46] Generalized linear discriminant analysis for high-dimensional genomic data
    Li, Sisi
    Lewinger, Juan Pablo
    [J]. GENETIC EPIDEMIOLOGY, 2018, 42 (07) : 713 - 713
  • [47] Diagonal Discriminant Analysis With Feature Selection for High-Dimensional Data
    Romanes, Sarah E.
    Ormerod, John T.
    Yang, Jean Y. H.
    [J]. JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2020, 29 (01) : 114 - 127
  • [48] Model-based Co-clustering for High Dimensional Sparse Data
    Salah, Aghiles
    Rogovschi, Nicoleta
    Nadif, Mohamed
    [J]. ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 51, 2016, 51 : 866 - 874
  • [49] Forecasting Simultaneously High-Dimensional Time Series: A Robust Model-Based Clustering Approach
    Wang, Yongning
    Tsay, Ruey S.
    Ledolter, Johannes
    Shrestha, Keshab M.
    [J]. JOURNAL OF FORECASTING, 2013, 32 (08) : 673 - 684
  • [50] VARIABLE SELECTION AND UPDATING IN MODEL-BASED DISCRIMINANT ANALYSIS FOR HIGH DIMENSIONAL DATA WITH FOOD AUTHENTICITY APPLICATIONS
    Murphy, Thomas Brendan
    Dean, Nema
    Raftery, Adrian E.
    [J]. ANNALS OF APPLIED STATISTICS, 2010, 4 (01): : 396 - 421