MINIMAX BOUNDS FOR SPARSE PCA WITH NOISY HIGH-DIMENSIONAL DATA

被引:93
|
作者
Birnbaum, Aharon [1 ]
Johnstone, Iain M. [2 ]
Nadler, Boaz [3 ]
Paul, Debashis [4 ]
机构
[1] Hebrew Univ Jerusalem, Jerusalem 91904, Israel
[2] Stanford Univ, Dept Stat, Stanford, CA 94305 USA
[3] Weizmann Inst Sci, Dept Comp Sci & Appl Math, Rehovot 76100, Israel
[4] Univ Calif Davis, Dept Stat, Davis, CA 95616 USA
来源
ANNALS OF STATISTICS | 2013年 / 41卷 / 03期
基金
美国国家科学基金会;
关键词
Minimax risk; high-dimensional data; principal component analysis; sparsity; spiked covariance model; PRINCIPAL-COMPONENTS-ANALYSIS; CONSISTENCY; RATES;
D O I
10.1214/12-AOS1014
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We study the problem of estimating the leading eigenvectors of a high-dimensional population covariance matrix based on independent Gaussian observations. We establish a lower bound on the minimax risk of estimators under the l(2) loss, in the joint limit as dimension and sample size increase to infinity, under various models of sparsity for the population eigenvectors. The lower bound on the risk points to the existence of different regimes of sparsity of the eigenvectors. We also propose a new method for estimating the eigenvectors by a two-stage coordinate selection scheme.
引用
收藏
页码:1055 / 1084
页数:30
相关论文
共 50 条
  • [31] Principal component analysis for sparse high-dimensional data
    Raiko, Tapani
    Ilin, Alexander
    Karhunen, Juha
    [J]. NEURAL INFORMATION PROCESSING, PART I, 2008, 4984 : 566 - 575
  • [32] Sparse kernel methods for high-dimensional survival data
    Evers, Ludger
    Messow, Claudia-Martina
    [J]. BIOINFORMATICS, 2008, 24 (14) : 1632 - 1638
  • [33] Sparse meta-analysis with high-dimensional data
    He, Qianchuan
    Zhang, Hao Helen
    Avery, Christy L.
    Lin, D. Y.
    [J]. BIOSTATISTICS, 2016, 17 (02) : 205 - 220
  • [34] Efficient Sparse Representation for Learning With High-Dimensional Data
    Chen, Jie
    Yang, Shengxiang
    Wang, Zhu
    Mao, Hua
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (08) : 4208 - 4222
  • [35] Ensemble of sparse classifiers for high-dimensional biological data
    Kim, Sunghan
    Scalzo, Fabien
    Telesca, Donatello
    Hu, Xiao
    [J]. INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2015, 12 (02) : 167 - 183
  • [36] Subspace Clustering of Very Sparse High-Dimensional Data
    Peng, Hankui
    Pavlidis, Nicos
    Eckley, Idris
    Tsalamanis, Ioannis
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 3780 - 3783
  • [37] Single-Pass PCA of Large High-Dimensional Data
    Yu, Wenjian
    Gu, Yu
    Li, Jian
    Liu, Shenghua
    Li, Yaohang
    [J]. PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 3350 - 3356
  • [38] Robust PCA for high-dimensional data based on characteristic transformation
    He, Lingyu
    Yang, Yanrong
    Zhang, Bo
    [J]. AUSTRALIAN & NEW ZEALAND JOURNAL OF STATISTICS, 2023, 65 (02) : 127 - 151
  • [39] A Robust Supervised Variable Selection for Noisy High-Dimensional Data
    Kalina, Jan
    Schlenker, Anna
    [J]. BIOMED RESEARCH INTERNATIONAL, 2015, 2015
  • [40] Computing high-dimensional invariant distributions from noisy data
    Lin, Bo
    Li, Qianxiao
    Ren, Weiqing
    [J]. JOURNAL OF COMPUTATIONAL PHYSICS, 2023, 474