Graph-based sparse linear discriminant analysis for high-dimensional classification

被引:10
|
作者
Liu, Jianyu [1 ]
Yu, Guan [2 ]
Liu, Yufeng [1 ,3 ,4 ,5 ]
机构
[1] Univ N Carolina, Dept Stat & Operat Res, Chapel Hill, NC 27599 USA
[2] SUNY Buffalo, Dept Biostat, Buffalo, NY 14214 USA
[3] Univ N Carolina, Dept Genet, Chapel Hill, NC 27599 USA
[4] Univ N Carolina, Dept Biostat, Chapel Hill, NC 27599 USA
[5] Univ N Carolina, Carolina Ctr Genome Sci, Chapel Hill, NC 27599 USA
基金
美国国家科学基金会;
关键词
Feature structure; Gaussian graphical models; Regularization; Undirected graph; VARIABLE SELECTION; MODEL SELECTION; PENALIZED REGRESSION; NETWORK; LASSO;
D O I
10.1016/j.jmva.2018.12.007
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Linear discriminant analysis (LDA) is a well-known classification technique that enjoyed great success in practical applications. Despite its effectiveness for traditional low-dimensional problems, extensions of LDA are necessary in order to classify high dimensional data. Many variants of LDA have been proposed in the literature. However, most of these methods do not fully incorporate the structure information among predictors when such information is available. In this paper, we introduce a new high-dimensional LDA technique, namely graph-based sparse LDA (GSLDA), that utilizes the graph structure among the features. In particular, we use the regularized regression formulation for penalized LDA techniques, and propose to impose a structure-based sparse penalty on the discriminant vector beta. The graph structure can be either given or estimated from the training data. Moreover, we explore the relationship between the within-class feature structure and the overall feature structure. Based on this relationship, we further propose a variant of our proposed GSLDA to utilize effectively unlabeled data, which can be abundant in the semi-supervised learning setting. With the new regularization, we can obtain a sparse estimate of beta and more accurate and interpretable classifiers than many existing methods. Both the selection consistency of beta estimation and the convergence rate of the classifier are established, and the resulting classifier has an asymptotic Bayes error rate. Finally, we demonstrate the competitive performance of the proposed GSLDA on both simulated and real data studies. (C) 2018 Elsevier Inc. All rights reserved.
引用
收藏
页码:250 / 269
页数:20
相关论文
共 50 条
  • [1] On sparse linear discriminant analysis algorithm for high-dimensional data classification
    Ng, Michael K.
    Liao, Li-Zhi
    Zhang, Leihong
    NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS, 2011, 18 (02) : 223 - 235
  • [2] Sparse Graph-Based Discriminant Analysis for Hyperspectral Imagery
    Ly, Nam Hoai
    Du, Qian
    Fowler, James E.
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2014, 52 (07): : 3872 - 3884
  • [3] A Hybrid Dimension Reduction Based Linear Discriminant Analysis for Classification of High-Dimensional Data
    Zorarpaci, Ezgi
    2021 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC 2021), 2021, : 1028 - 1036
  • [4] Weighted linear programming discriminant analysis for high-dimensional binary classification
    Wu, Yufei
    Yu, Guan
    STATISTICAL ANALYSIS AND DATA MINING, 2020, 13 (05) : 437 - 450
  • [5] Modified linear discriminant analysis approaches for classification of high-dimensional microarray data
    Xu, Ping
    Brock, Guy N.
    Parrish, Rudolph S.
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2009, 53 (05) : 1674 - 1687
  • [6] A modified linear discriminant analysis for high-dimensional data
    Hyodo, Masashi
    Yamada, Takayuki
    Himeno, Tetsuto
    Seo, Takashi
    HIROSHIMA MATHEMATICAL JOURNAL, 2012, 42 (02) : 209 - 231
  • [7] A CONVEX OPTIMIZATION APPROACH TO HIGH-DIMENSIONAL SPARSE QUADRATIC DISCRIMINANT ANALYSIS
    Cai, T. Tony
    Zhang, Linjun
    ANNALS OF STATISTICS, 2021, 49 (03): : 1537 - 1568
  • [8] Robust landmark graph-based clustering for high-dimensional data
    Yang, Ben
    Wu, Jinghan
    Sun, Aoran
    Gao, Naying
    Zhang, Xuetao
    NEUROCOMPUTING, 2022, 496 : 72 - 84
  • [9] Improved Graph-Based Metrics for Clustering High-Dimensional Datasets
    Baya, Ariel E.
    Granitto, Pablo M.
    ADVANCES IN ARTIFICIAL INTELLIGENCE - IBERAMIA 2010, 2010, 6433 : 184 - 193
  • [10] Robust landmark graph-based clustering for high-dimensional data
    Yang, Ben
    Wu, Jinghan
    Sun, Aoran
    Gao, Naying
    Zhang, Xuetao
    Neurocomputing, 2022, 496 : 72 - 84