MODEL-BASED CLUSTERING OF HIGH-DIMENSIONAL DATA IN ASTROPHYSICS

被引:3
|
作者
Bouveyron, C. [1 ,2 ]
机构
[1] Univ Paris 05, UMR CNRS 8145, Lab MAP5, Paris, France
[2] Sorbonne Paris Cite, Paris, France
关键词
VARIABLE SELECTION; DISCRIMINANT-ANALYSIS; MIXTURE;
D O I
10.1051/eas/1677006
中图分类号
P1 [天文学];
学科分类号
0704 ;
摘要
The nature of data in Astrophysics has changed, as in other scientific fields, in the past decades due to the increase of the measurement capabilities. As a consequence, data are nowadays frequently of high dimensionality and available in mass or stream. Model-based techniques for clustering are popular tools which are renowned for their probabilistic foundations and their flexibility. However, classical model based techniques show a disappointing behavior in high-dimensional spaces which is mainly due to their dramatical over-parametrization. The recent developments in model-based classification overcome these drawbacks and allow to efficiently classify high-dimensional data, even in the "small n / large p" situation. This work presents a comprehensive review of these recent approaches, including regularization-based techniques, parsimonious modeling, subspace classification methods and classification methods based on variable selection. The use of these model-based methods is also illustrated on real-world classification problems in Astrophysics using R packages.
引用
收藏
页码:91 / 119
页数:29
相关论文
共 50 条
  • [21] Model-based Co-clustering for High Dimensional Sparse Data
    Salah, Aghiles
    Rogovschi, Nicoleta
    Nadif, Mohamed
    [J]. ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 51, 2016, 51 : 866 - 874
  • [22] Clustering algorithm of high-dimensional data based on units
    School of In formation Engineering, Hubei Institute for Nationalities, Enshi 445000, China
    [J]. Jisuanji Yanjiu yu Fazhan, 2007, 9 (1618-1623): : 1618 - 1623
  • [23] Invariant variational principle for model-based interpolation of high-dimensional clustered data
    Venkatesan, RC
    [J]. INTERNET IMAGING II, 2001, 4311 : 389 - 397
  • [24] A Hierarchical Gamma Mixture Model-Based Method for Classification of High-Dimensional Data
    Azhar, Muhammad
    Li, Mark Junjie
    Huang, Joshua Zhexue
    [J]. ENTROPY, 2019, 21 (09)
  • [25] Clustering of High-Dimensional and Correlated Data
    McLachlan, Geoffrey J.
    Ng, Shu-Kay
    Wang, K.
    [J]. DATA ANALYSIS AND CLASSIFICATION, 2010, : 3 - 11
  • [26] Clustering in high-dimensional data spaces
    Murtagh, FD
    [J]. STATISTICAL CHALLENGES IN ASTRONOMY, 2003, : 279 - 292
  • [27] Compressive Clustering of High-dimensional Data
    Ruta, Andrzej
    Porikli, Fatih
    [J]. 2012 11TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2012), VOL 1, 2012, : 380 - 385
  • [28] A sparse factor model for clustering high-dimensional longitudinal data
    Lu, Zihang
    Chandra, Noirrit Kiran
    [J]. STATISTICS IN MEDICINE, 2024, 43 (19) : 3633 - 3648
  • [29] The Remarkable Simplicity of Very High Dimensional Data: Application of Model-Based Clustering
    Fionn Murtagh
    [J]. Journal of Classification, 2009, 26 : 249 - 277
  • [30] The Remarkable Simplicity of Very High Dimensional Data: Application of Model-Based Clustering
    Murtagh, Fionn
    [J]. JOURNAL OF CLASSIFICATION, 2009, 26 (03) : 249 - 277