Cauchy robust principal component analysis with applications to high-dimensional data sets

被引:1
|
作者
Fayomi, Aisha [1 ]
Pantazis, Yannis [2 ]
Tsagris, Michail [3 ]
Wood, Andrew T. A. [4 ]
机构
[1] King Abdulaziz Univ, Dept Stat, Abdullah Sulayman St, Mecca 21589, Saudi Arabia
[2] Fdn Res & Technol Hellas, Inst Appl & Computat Math, Vassilika 70013, Greece
[3] Univ Crete, Dept Econ, Gallos Campus, Rethimnon 74100, Greece
[4] Australian Natl Univ, Res Sch Finance Actuarial Studies & Stat, 26C Kingsley St, Canberra, ACT 0200, Australia
关键词
Principal component analysis; Robust; Cauchy log-likelihood; High-dimensional data; PROJECTION; LOCATION;
D O I
10.1007/s11222-023-10328-x
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Principal component analysis (PCA) is a standard dimensionality reduction technique used in various research and applied fields. From an algorithmic point of view, classical PCA can be formulated in terms of operations on a multivariate Gaussian likelihood. As a consequence of the implied Gaussian formulation, the principal components are not robust to outliers. In this paper, we propose a modified formulation, based on the use of a multivariate Cauchy likelihood instead of the Gaussian likelihood, which has the effect of robustifying the principal components. We present an algorithm to compute these robustified principal components. We additionally derive the relevant influence function of the first component and examine its theoretical properties. Simulation experiments on high-dimensional datasets demonstrate that the estimated principal components based on the Cauchy likelihood typically outperform, or are on a par with, existing robust PCA techniques. Moreover, the Cauchy PCA algorithm we have used has much lower computational cost in very high dimensional settings than the other public domain robust PCA methods we consider.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Cauchy robust principal component analysis with applications to high-dimensional data sets
    Aisha Fayomi
    Yannis Pantazis
    Michail Tsagris
    Andrew T. A. Wood
    [J]. Statistics and Computing, 2024, 34
  • [2] High-dimensional robust principal component analysis and its applications
    Jiang, Xiaobo
    Gao, Jie
    Yang, Zhongming
    [J]. JOURNAL OF COMPUTATIONAL METHODS IN SCIENCES AND ENGINEERING, 2023, 23 (05) : 2303 - 2311
  • [3] Principal component analysis for sparse high-dimensional data
    Raiko, Tapani
    Ilin, Alexander
    Karhunen, Juha
    [J]. NEURAL INFORMATION PROCESSING, PART I, 2008, 4984 : 566 - 575
  • [4] Multilevel Functional Principal Component Analysis for High-Dimensional Data
    Zipunnikov, Vadim
    Caffo, Brian
    Yousem, David M.
    Davatzikos, Christos
    Schwartz, Brian S.
    Crainiceanu, Ciprian
    [J]. JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2011, 20 (04) : 852 - 873
  • [5] Tensor robust principal component analysis with total generalized variation for high-dimensional data recovery
    Xu, Zhi
    Yang, Jing-Hua
    Wang, Chuan-long
    Wang, Fusheng
    Yan, Xi-hong
    [J]. APPLIED MATHEMATICS AND COMPUTATION, 2024, 483
  • [6] On principal component analysis for high-dimensional XCSR
    Behdad, Mohammad
    French, Tim
    Barone, Luigi
    Bennamoun, Mohammed
    [J]. EVOLUTIONARY INTELLIGENCE, 2012, 5 (02) : 129 - 138
  • [7] Adaptive local Principal Component Analysis improves the clustering of high-dimensional data
    Migenda, Nico
    Moeller, Ralf
    Schenck, Wolfram
    [J]. PATTERN RECOGNITION, 2024, 146
  • [8] Exploring high-dimensional biological data with sparse contrastive principal component analysis
    Boileau, Philippe
    Hejazi, Nima S.
    Dudoit, Sandrine
    [J]. BIOINFORMATICS, 2020, 36 (11) : 3422 - 3430
  • [9] High-dimensional principal component analysis with heterogeneous missingness
    Zhu, Ziwei
    Wang, Tengyao
    Samworth, Richard J.
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2022, 84 (05) : 2000 - 2031
  • [10] PRINCIPAL COMPONENT ANALYSIS IN VERY HIGH-DIMENSIONAL SPACES
    Lee, Young Kyung
    Lee, Eun Ryung
    Park, Byeong U.
    [J]. STATISTICA SINICA, 2012, 22 (03) : 933 - 956