Sparse supervised principal component analysis (SSPCA) for dimension reduction and variable selection

被引:32
|
作者
Sharifzadeh, Sara [1 ]
Ghodsi, Ali [2 ]
Clemmensen, Line H. [1 ]
Ersboll, Bjarne K. [1 ]
机构
[1] Tech Univ Denmark, Dept Appl Math & Comp Sci, DK-2800 Lyngby, Denmark
[2] Univ Waterloo, Dept Stat & Actuarial Sci, Waterloo, ON, Canada
关键词
Variable selection; Dimension reduction; Sparse PCA; Supervised PCA; Sparse supervised PCA; Penalized matrix decomposition; PREDICTION;
D O I
10.1016/j.engappai.2017.07.004
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Principal component analysis (PCA)(1) is one of the main unsupervised pre-processing methods for dimension reduction. When the training labels are available, it is worth using a supervised PCA strategy. In cases that both dimension reduction and variable selection are required, sparse PCA (SPCA) methods are preferred. In this paper, a sparse supervised PCA (SSPCA) method is proposed for pre-processing. This method is appropriate especially in problems where, a high dimensional input necessitates the use of a sparse method and a target label is also available to guide the variable selection strategy. Such a method is valuable in many Engineering and scientific problems, when the number of training samples is also limited. The Hilbert Schmidt Independence Criteria (HSIC) is used to form an objective based on minimization of a loss function and an L-1 norm is used for regularization of the Eigen vectors. While the proposed objective function allows a sparse low rank solution for both linear and non-linear relationships between the input and response matrices, other similar methods in this case are only based on a linear model. The objective is solved based on penalized matrix decomposition (PMD) algorithm. We compare the proposed method with PCA, PMD-based SPCA and supervised PCA. In addition, SSPCA is also compared with sparse partial least squares (SPLS), due to the similarity between the two objective functions. Experimental results from the simulated as well as real data sets show that, SSPCA provides an appropriate trade-off between accuracy and sparsity. Comparisons show that, in terms of sparsity, SSPCA performs the highest level of variable reduction and also, in terms of accuracy it is one of the most successful methods. Therefore, the Eigen vectors found by SSPCA can be used for feature selection in various high dimensional problems. (C) 2017 Elsevier Ltd. All rights reserved.
引用
收藏
页码:168 / 177
页数:10
相关论文
共 50 条
  • [1] Dimension selection for feature selection and dimension reduction with principal and independent component analysis
    Koch, Inge
    Naito, Kanta
    [J]. NEURAL COMPUTATION, 2007, 19 (02) : 513 - 545
  • [2] Supervised Sparse and Functional Principal Component Analysis
    Li, Gen
    Shen, Haipeng
    Huang, Jianhua Z.
    [J]. JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2016, 25 (03) : 859 - 878
  • [3] Dimension reduction in radio maps based on the supervised kernel principal component analysis
    Jia, Bing
    Huang, Baoqi
    Gao, Hepeng
    Li, Wuyungerile
    [J]. SOFT COMPUTING, 2018, 22 (23) : 7697 - 7703
  • [4] Dimension reduction in radio maps based on the supervised kernel principal component analysis
    Bing Jia
    Baoqi Huang
    Hepeng Gao
    Wuyungerile Li
    [J]. Soft Computing, 2018, 22 : 7697 - 7703
  • [5] Learning sparse gradients for variable selection and dimension reduction
    Gui-Bo Ye
    Xiaohui Xie
    [J]. Machine Learning, 2012, 87 : 303 - 355
  • [6] Learning sparse gradients for variable selection and dimension reduction
    Ye, Gui-Bo
    Xie, Xiaohui
    [J]. MACHINE LEARNING, 2012, 87 (03) : 303 - 355
  • [7] Dimension reduction in principal component analysis for trees
    Alfaro, Carlos A.
    Aydin, Burcu
    Valencia, Carlos E.
    Bullitt, Elizabeth
    Ladha, Alim
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2014, 74 : 157 - 179
  • [8] Dimension reduction by local principal component analysis
    Kambhatla, N
    Leen, TK
    [J]. NEURAL COMPUTATION, 1997, 9 (07) : 1493 - 1516
  • [9] Sparse variable principal component analysis with application to fMRI
    Ulfarsson, Magnus O.
    Solo, Victor
    [J]. 2007 4TH IEEE INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING : MACRO TO NANO, VOLS 1-3, 2007, : 460 - +
  • [10] SPARSE PRINCIPAL COMPONENT ANALYSIS VIA VARIABLE PROJECTION
    Erichson, N. Benjamin
    Zheng, Peng
    Manohar, Krithika
    Brunton, Steven L.
    Kutz, J. Nathan
    Aravkin, Aleksandr Y.
    [J]. SIAM JOURNAL ON APPLIED MATHEMATICS, 2020, 80 (02) : 977 - 1002