MODEL-BASED CLUSTERING OF HIGH-DIMENSIONAL DATA IN ASTROPHYSICS

被引:3
|
作者
Bouveyron, C. [1 ,2 ]
机构
[1] Univ Paris 05, UMR CNRS 8145, Lab MAP5, Paris, France
[2] Sorbonne Paris Cite, Paris, France
关键词
VARIABLE SELECTION; DISCRIMINANT-ANALYSIS; MIXTURE;
D O I
10.1051/eas/1677006
中图分类号
P1 [天文学];
学科分类号
0704 ;
摘要
The nature of data in Astrophysics has changed, as in other scientific fields, in the past decades due to the increase of the measurement capabilities. As a consequence, data are nowadays frequently of high dimensionality and available in mass or stream. Model-based techniques for clustering are popular tools which are renowned for their probabilistic foundations and their flexibility. However, classical model based techniques show a disappointing behavior in high-dimensional spaces which is mainly due to their dramatical over-parametrization. The recent developments in model-based classification overcome these drawbacks and allow to efficiently classify high-dimensional data, even in the "small n / large p" situation. This work presents a comprehensive review of these recent approaches, including regularization-based techniques, parsimonious modeling, subspace classification methods and classification methods based on variable selection. The use of these model-based methods is also illustrated on real-world classification problems in Astrophysics using R packages.
引用
收藏
页码:91 / 119
页数:29
相关论文
共 50 条
  • [1] Model-based clustering of high-dimensional data: A review
    Bouveyron, Charles
    Brunet-Saumard, Camille
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2014, 71 : 52 - 78
  • [2] Model-based regression clustering for high-dimensional data: application to functional data
    Devijver, Emilie
    [J]. ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2017, 11 (02) : 243 - 279
  • [3] Model-based clustering of high-dimensional longitudinal data via regularization
    Yang, Luoying
    Wu, Tong Tong
    [J]. BIOMETRICS, 2023, 79 (02) : 761 - 774
  • [4] Variable selection for model-based high-dimensional clustering
    Wang, Sijian
    Zhu, Ji
    [J]. PREDICTION AND DISCOVERY, 2007, 443 : 177 - +
  • [5] A Hierarchical Model-based Approach to Co-Clustering High-Dimensional Data
    Costa, Gianni
    Manco, Giuseppe
    Ortale, Riccardo
    [J]. APPLIED COMPUTING 2008, VOLS 1-3, 2008, : 886 - 890
  • [6] Pairwise Variable Selection for High-Dimensional Model-Based Clustering
    Guo, Jian
    Levina, Elizaveta
    Michailidis, George
    Zhu, Ji
    [J]. BIOMETRICS, 2010, 66 (03) : 793 - 804
  • [7] Model-based multifacet clustering with high-dimensional omics applications
    Zong, Wei
    Li, Danyang
    Seney, Marianne L.
    Mcclung, Colleen A.
    Tseng, George C.
    [J]. BIOSTATISTICS, 2024,
  • [8] Model-based clustering of high-dimensional data: Variable selection versus facet determination
    Poon, Leonard K. M.
    Zhang, Nevin L.
    Liu, Tengfei
    Liu, April H.
    [J]. INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2013, 54 (01) : 196 - 215
  • [9] Model-based clustering of high-dimensional data streams with online mixture of probabilistic PCA
    Anastasios Bellas
    Charles Bouveyron
    Marie Cottrell
    Jérôme Lacaille
    [J]. Advances in Data Analysis and Classification, 2013, 7 : 281 - 300
  • [10] Model-based clustering of high-dimensional data streams with online mixture of probabilistic PCA
    Bellas, Anastasios
    Bouveyron, Charles
    Cottrell, Marie
    Lacaille, Jerome
    [J]. ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2013, 7 (03) : 281 - 300