INFERENCE FOR HETEROSKEDASTIC PCA WITH MISSING DATA

被引:2
|
作者
Yan, Yuling [1 ]
Chen, Yuxin [2 ]
Fan, Jianqing [3 ]
机构
[1] MIT, Inst Data Syst & Soc, Cambridge, MA 02144 USA
[2] Univ Penn, Wharton Sch, Dept Stat & Data Sci, Philadelphia, PA USA
[3] Princeton Univ, Dept Operat Res & Financial Engn, Princeton, NJ USA
来源
ANNALS OF STATISTICS | 2024年 / 52卷 / 02期
关键词
Principal component analysis; confidence regions; missing data; uncertainty quantification; heteroskedastic data; subspace estimation; LOW-RANK MATRIX; CONFIDENCE-INTERVALS; UNCERTAINTY QUANTIFICATION; PRINCIPAL COMPONENTS; SINGULAR SUBSPACES; LARGEST EIGENVALUE; ROBUST REGRESSION; GRADIENT DESCENT; COMPLETION; NOISY;
D O I
10.1214/24-AOS2366
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
This paper studies how to construct confidence regions for principal component analysis (PCA) in high dimension, a problem that has been vastly underexplored. While computing measures of uncertainty for nonlinear/nonconvex estimators is in general difficult in high dimension, the challenge is further compounded by the prevalent presence of missing data and heteroskedastic noise. We propose a novel approach to performing valid inference on the principal subspace, on the basis of an estimator called HeteroPCA guarantees for HeteroPCA, and demonstrate how these can be invoked to compute both confidence regions for the principal subspace and entrywise confidence intervals for the spiked covariance matrix. Our inference procedures are fully data-driven and adaptive to heteroskedastic random noise, without requiring prior knowledge about the noise levels.
引用
收藏
页码:729 / 756
页数:28
相关论文
共 50 条
  • [21] IDENTIFICATION AND INFERENCE WITH NONIGNORABLE MISSING COVARIATE DATA
    Miao, Wang
    Tchetgen, Eric Tchetgen
    STATISTICA SINICA, 2018, 28 (04) : 2049 - 2067
  • [22] Bayesian nonparametric for causal inference and missing data
    Chen, Li-Pang
    BIOMETRICS, 2024, 80 (01)
  • [23] Inference of missing data in photovoltaic monitoring datasets
    Koubli, Eleni
    Palmer, Diane
    Rowley, Paul
    Gottschalg, Ralph
    IET RENEWABLE POWER GENERATION, 2016, 10 (04) : 434 - 439
  • [24] Haplotype and missing data inference in nuclear families
    Lin, S
    Chakravarti, A
    Cutler, DJ
    GENOME RESEARCH, 2004, 14 (08) : 1624 - 1632
  • [25] IDENTIFICATION AND INFERENCE ON REGRESSIONS WITH MISSING COVARIATE DATA
    Aucejo, Esteban M.
    Bugni, Federico A.
    Hotz, V. Joseph
    ECONOMETRIC THEORY, 2017, 33 (01) : 196 - 241
  • [26] Inference of stochastic time series with missing data
    Lee, Sangwon
    Periwal, Vipul
    Jo, Junghyo
    PHYSICAL REVIEW E, 2021, 104 (02)
  • [27] Testing inference in heteroskedastic fixed effects models
    Uchoa, Carlos F. A.
    Cribari-Neto, Francisco
    Menezes, Tatiane A.
    EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2014, 235 (03) : 660 - 670
  • [28] Local PCA Regression for Missing Data Estimation in Telecommunication Dataset
    Sato, T.
    Huang, B. Q.
    Huang, Y.
    Kechadi, M. -T.
    PRICAI 2010: TRENDS IN ARTIFICIAL INTELLIGENCE, 2010, 6230 : 668 - 673
  • [29] Likelihood-based Inference with Missing Data Under Missing-at-Random
    Yang, Shu
    Kim, Jae Kwang
    SCANDINAVIAN JOURNAL OF STATISTICS, 2016, 43 (02) : 436 - 454
  • [30] The effect of sample size and missingness on inference with missing data
    Morimoto, Julian
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2024, 53 (09) : 3292 - 3311