Variable selection in model-based discriminant analysis

被引:26
|
作者
Maugis, C. [1 ]
Celeux, G. [2 ]
Martin-Magniette, M-L [3 ,4 ]
机构
[1] Univ Toulouse, INSA Toulouse, Inst Math Toulouse, F-31077 Toulouse 4, France
[2] Inria Saclay Ile de France, Sophia Antipolis, France
[3] UMR AgroParisTech INRA MIA 518, Paris, France
[4] ERL CNRS 8196, UEVE, URGV UMR INRA 1165, Evry, France
关键词
Discriminant; redundant or independent variables; Variable selection; Gaussian classification models; Linear regression; BIC; CLASSIFICATION;
D O I
10.1016/j.jmva.2011.05.004
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
A general methodology for selecting predictors for Gaussian generative classification models is presented. The problem is regarded as a model selection problem. Three different roles for each possible predictor are considered: a variable can be a relevant classification predictor or not, and the irrelevant classification variables can be linearly dependent on a part of the relevant predictors or independent variables. This variable selection model was inspired by a previous work on variable selection in model-based clustering. A BIC-like model selection criterion is proposed. It is optimized through two embedded forward stepwise variable selection algorithms for classification and linear regression. The model identifiability and the consistency of the variable selection criterion are proved. Numerical experiments on simulated and real data sets illustrate the interest of this variable selection methodology. In particular, it is shown that this well ground variable selection model can be of great interest to improve the classification performance of the quadratic discriminant analysis in a high dimension context. (C) 2011 Elsevier Inc. All rights reserved.
引用
收藏
页码:1374 / 1387
页数:14
相关论文
共 50 条
  • [1] Variable selection in model-based clustering and discriminant analysis with a regularization approach
    Gilles Celeux
    Cathy Maugis-Rabusseau
    Mohammed Sedki
    Advances in Data Analysis and Classification, 2019, 13 : 259 - 278
  • [2] Variable selection in model-based clustering and discriminant analysis with a regularization approach
    Celeux, Gilles
    Maugis-Rabusseau, Cathy
    Sedki, Mohammed
    ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2019, 13 (01) : 259 - 278
  • [3] VARIABLE SELECTION AND UPDATING IN MODEL-BASED DISCRIMINANT ANALYSIS FOR HIGH DIMENSIONAL DATA WITH FOOD AUTHENTICITY APPLICATIONS
    Murphy, Thomas Brendan
    Dean, Nema
    Raftery, Adrian E.
    ANNALS OF APPLIED STATISTICS, 2010, 4 (01): : 396 - 421
  • [4] Variable selection in discriminant analysis based on the location model for mixed variables
    Nor Idayu Mahat
    Wojtek Janusz Krzanowski
    Adolfo Hernandez
    Advances in Data Analysis and Classification, 2007, 1 : 105 - 122
  • [5] Variable selection in discriminant analysis based on the location model for mixed variables
    Mahat, Nor Idayu
    Krzanowski, Wojtek Janusz
    Hernandez, Adolfo
    ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2007, 1 (02) : 105 - 122
  • [6] Variable selection for model-based clustering
    Raftery, AE
    Dean, N
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2006, 101 (473) : 168 - 178
  • [7] Kernel Canonical Discriminant Analysis Based on Variable Selection
    Ikeda, Seiichi
    Sato, Yoshiharu
    JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, 2009, 13 (04) : 416 - 420
  • [8] On Model-Based Clustering, Classification, and Discriminant Analysis
    McNicholas, Paul D.
    JIRSS-JOURNAL OF THE IRANIAN STATISTICAL SOCIETY, 2011, 10 (02): : 181 - 199
  • [9] Variable selection methods for model-based clustering
    Fop, Michael
    Murphy, Thomas Brendan
    STATISTICS SURVEYS, 2018, 12 : 18 - 65
  • [10] Variational discriminant analysis with variable selection
    Yu, Weichang
    Ormerod, John T.
    Stewart, Michael
    STATISTICS AND COMPUTING, 2020, 30 (04) : 933 - 951