Towards a Theoretical Analysis of PCA for Heteroscedastic Data

被引:0
|
作者
Hong, David [1 ]
Balzano, Laura [1 ]
Fessler, Jeffrey A. [1 ]
机构
[1] Univ Michigan, Ann Arbor, MI 48109 USA
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Principal Component Analysis (PCA) is a method for estimating a subspace given noisy samples. It is useful in a variety of problems ranging from dimensionality reduction to anomaly detection and the visualization of high dimensional data. PCA performs well in the presence of moderate noise and even with missing data, but is also sensitive to outliers. PCA is also known to have a phase transition when noise is independent and identically distributed; recovery of the subspace sharply declines at a threshold noise variance. Effective use of PCA requires a rigorous understanding of these behaviors. This paper provides a step towards an analysis of PCA for samples with heteroscedastic noise, that is, samples that have non-uniform noise variances and so are no longer identically distributed. In particular, we provide a simple asymptotic prediction of the recovery of a one-dimensional subspace from noisy heteroscedastic samples. The prediction enables: a) easy and efficient calculation of the asymptotic performance, and b) qualitative reasoning to understand how PCA is impacted by heteroscedasticity (such as outliers).
引用
收藏
页码:496 / 503
页数:8
相关论文
共 50 条
  • [1] PROBABILISTIC PCA FOR HETEROSCEDASTIC DATA
    Hong, David
    Balzano, Laura
    Fessler, Jeffrey A.
    [J]. 2019 IEEE 8TH INTERNATIONAL WORKSHOP ON COMPUTATIONAL ADVANCES IN MULTI-SENSOR ADAPTIVE PROCESSING (CAMSAP 2019), 2019, : 26 - 30
  • [2] HePPCAT: Probabilistic PCA for Data With Heteroscedastic Noise
    Hong, David
    Gilman, Kyle
    Balzano, Laura
    Fessler, Jeffrey A.
    [J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2021, 69 : 4819 - 4834
  • [3] Optimally Weighted PCA for High-Dimensional Heteroscedastic Data
    Hong, David
    Yang, Fan
    Fessler, Jeffrey A.
    Balzano, Laura
    [J]. SIAM JOURNAL ON MATHEMATICS OF DATA SCIENCE, 2023, 5 (01): : 222 - 250
  • [4] Asymptotic performance of PCA for high-dimensional heteroscedastic data
    Hong, David
    Balzano, Laura
    Fessler, Jeffrey A.
    [J]. JOURNAL OF MULTIVARIATE ANALYSIS, 2018, 167 : 435 - 452
  • [5] Optimal Spectral Shrinkage and PCA With Heteroscedastic Noise
    Leeb, William
    Romanov, Elad
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 2021, 67 (05) : 3009 - 3037
  • [6] Heteroscedastic regression analysis method for mixed data
    FU Hui-min
    [J]. 航空动力学报, 2011, 26 (04) : 721 - 726
  • [7] Analysis Of Messy Data With Heteroscedastic In Mean Models
    Trianasari, Nurvita
    Sumarni, Cucu
    [J]. APPLICATION OF MATHEMATICS IN INDUSTRY AND LIFE, 2016, 1716
  • [8] Analysis of unbalanced factorial designs with heteroscedastic data
    Vallejo, G.
    Fernandez, M. P.
    Livacic-Rojas, P. E.
    [J]. JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2010, 80 (1-2) : 75 - 88
  • [9] Principal Component Analysis (PCA) for Powder Diffraction Data: Towards Unblinded Applications
    Chernyshov, Dmitry
    Dovgaliuk, Iurii
    Dyadkin, Vadim
    van Beek, Wouter
    [J]. CRYSTALS, 2020, 10 (07) : 1 - 16
  • [10] Probabilistic PCA from Heteroscedastic Signals: Geometric Framework and Application to Clustering
    Collas, Antoine
    Bouchard, Florent
    Breloy, Arnaud
    Ginolhac, Guillaume
    Ren, Chengfang
    Ovarlez, Jean-Philippe
    [J]. IEEE Transactions on Signal Processing, 2021, 69 : 6546 - 6560