INFERENCE FOR HETEROSKEDASTIC PCA WITH MISSING DATA

被引:1
|
作者
Yan, Yuling [1 ]
Chen, Yuxin [2 ]
Fan, Jianqing [3 ]
机构
[1] MIT, Inst Data Syst & Soc, Cambridge, MA 02144 USA
[2] Univ Penn, Wharton Sch, Dept Stat & Data Sci, Philadelphia, PA USA
[3] Princeton Univ, Dept Operat Res & Financial Engn, Princeton, NJ USA
来源
ANNALS OF STATISTICS | 2024年 / 52卷 / 02期
关键词
Principal component analysis; confidence regions; missing data; uncertainty quantification; heteroskedastic data; subspace estimation; LOW-RANK MATRIX; CONFIDENCE-INTERVALS; UNCERTAINTY QUANTIFICATION; PRINCIPAL COMPONENTS; SINGULAR SUBSPACES; LARGEST EIGENVALUE; ROBUST REGRESSION; GRADIENT DESCENT; COMPLETION; NOISY;
D O I
10.1214/24-AOS2366
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
This paper studies how to construct confidence regions for principal component analysis (PCA) in high dimension, a problem that has been vastly underexplored. While computing measures of uncertainty for nonlinear/nonconvex estimators is in general difficult in high dimension, the challenge is further compounded by the prevalent presence of missing data and heteroskedastic noise. We propose a novel approach to performing valid inference on the principal subspace, on the basis of an estimator called HeteroPCA guarantees for HeteroPCA, and demonstrate how these can be invoked to compute both confidence regions for the principal subspace and entrywise confidence intervals for the spiked covariance matrix. Our inference procedures are fully data-driven and adaptive to heteroskedastic random noise, without requiring prior knowledge about the noise levels.
引用
收藏
页码:729 / 756
页数:28
相关论文
共 50 条
  • [1] Handling missing data from heteroskedastic and nonstationary data
    Nelwamondo, Fulufhelo V.
    Marwala, Tshilidzi
    [J]. ADVANCES IN NEURAL NETWORKS - ISNN 2007, PT 1, PROCEEDINGS, 2007, 4491 : 1293 - +
  • [2] Missing data in kernel PCA
    Sanguinetti, Guido
    Lawrence, Neil D.
    [J]. MACHINE LEARNING: ECML 2006, PROCEEDINGS, 2006, 4212 : 751 - 758
  • [3] HETEROSKEDASTIC PCA: ALGORITHM, OPTIMALITY, AND APPLICATIONS
    Zhang, Anru R.
    Cai, T. Tony
    Wu, Yihong
    [J]. ANNALS OF STATISTICS, 2022, 50 (01): : 53 - 80
  • [4] INFERENCE AND MISSING DATA - REPLY
    LITTLE, RJA
    [J]. BIOMETRIKA, 1976, 63 (03) : 590 - 591
  • [5] ROBUST PCA METHODS FOR COMPLETE AND MISSING DATA
    Karhunen, Juha
    [J]. NEURAL NETWORK WORLD, 2011, 21 (05) : 357 - 392
  • [6] Interpolation of signals with missing data using PCA
    Oliveira, P.
    [J]. 2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-13, 2006, : 3279 - 3282
  • [7] Inference in a structural heteroskedastic calibration model
    de Castro, Mario
    Galea, Manuel
    [J]. STATISTICAL PAPERS, 2015, 56 (02) : 479 - 494
  • [8] Robust inference in conditionally heteroskedastic autoregressions
    Pedersen, Rasmus Sondergaard
    [J]. ECONOMETRIC REVIEWS, 2020, 39 (03) : 244 - 259
  • [9] Inference and missing data: Asymptotic results
    Nielsen, SF
    [J]. SCANDINAVIAN JOURNAL OF STATISTICS, 1997, 24 (02) : 261 - 274
  • [10] Estimating Equations Inference With Missing Data
    Zhou, Yong
    Wan, Alan T. K.
    Wang, Xiaojing
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2008, 103 (483) : 1187 - 1199