Cauchy robust principal component analysis with applications to high-dimensional data sets

被引:1
|
作者
Fayomi, Aisha [1 ]
Pantazis, Yannis [2 ]
Tsagris, Michail [3 ]
Wood, Andrew T. A. [4 ]
机构
[1] King Abdulaziz Univ, Dept Stat, Abdullah Sulayman St, Mecca 21589, Saudi Arabia
[2] Fdn Res & Technol Hellas, Inst Appl & Computat Math, Vassilika 70013, Greece
[3] Univ Crete, Dept Econ, Gallos Campus, Rethimnon 74100, Greece
[4] Australian Natl Univ, Res Sch Finance Actuarial Studies & Stat, 26C Kingsley St, Canberra, ACT 0200, Australia
关键词
Principal component analysis; Robust; Cauchy log-likelihood; High-dimensional data; PROJECTION; LOCATION;
D O I
10.1007/s11222-023-10328-x
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Principal component analysis (PCA) is a standard dimensionality reduction technique used in various research and applied fields. From an algorithmic point of view, classical PCA can be formulated in terms of operations on a multivariate Gaussian likelihood. As a consequence of the implied Gaussian formulation, the principal components are not robust to outliers. In this paper, we propose a modified formulation, based on the use of a multivariate Cauchy likelihood instead of the Gaussian likelihood, which has the effect of robustifying the principal components. We present an algorithm to compute these robustified principal components. We additionally derive the relevant influence function of the first component and examine its theoretical properties. Simulation experiments on high-dimensional datasets demonstrate that the estimated principal components based on the Cauchy likelihood typically outperform, or are on a par with, existing robust PCA techniques. Moreover, the Cauchy PCA algorithm we have used has much lower computational cost in very high dimensional settings than the other public domain robust PCA methods we consider.
引用
收藏
页数:14
相关论文
共 50 条
  • [21] Information Analysis of High-Dimensional Data and Applications
    Yang, Xin-She
    Lee, Sanghyuk
    Lee, Sangmin
    Theera-Umpon, Nipon
    [J]. MATHEMATICAL PROBLEMS IN ENGINEERING, 2015, 2015
  • [22] Robust statistical methods for high-dimensional data, with applications in tribology
    Pfeiffer, Pia
    Filzmoser, Peter
    [J]. ANALYTICA CHIMICA ACTA, 2023, 1279
  • [23] Curvilinear component analysis: an efficient method for the unfolding and the representation of high-dimensional nonlinear data sets
    Jausions-Picaud, C.
    Herault, J.
    Guerin-Dugue, A.
    Oliva, A.
    [J]. PERCEPTION, 1998, 27 : 151 - 151
  • [24] A new proposal for a principal component-based test for high-dimensional data applied to the analysis of PhyloChip data
    Ding, Guo-Chun
    Smalla, Kornelia
    Heuer, Holger
    Kropf, Siegfried
    [J]. BIOMETRICAL JOURNAL, 2012, 54 (01) : 94 - 107
  • [25] Lagged principal trend analysis for longitudinal high-dimensional data
    Zhang, Yuping
    [J]. STAT, 2019, 8 (01):
  • [26] Joint principal trend analysis for longitudinal high-dimensional data
    Zhang, Yuping
    Ouyang, Zhengqing
    [J]. BIOMETRICS, 2018, 74 (02) : 430 - 438
  • [27] Software Tools for Robust Analysis of High-Dimensional Data
    Todorov, Valentin
    Filzmoser, Peter
    [J]. AUSTRIAN JOURNAL OF STATISTICS, 2014, 43 (04) : 255 - 266
  • [28] Robust analysis of cancer heterogeneity for high-dimensional data
    Cheng, Chao
    Feng, Xingdong
    Li, Xiaoguang
    Wu, Mengyun
    [J]. STATISTICS IN MEDICINE, 2022, 41 (27) : 5448 - 5462
  • [29] Robust regularized cluster analysis for high-dimensional data
    Kalina, Jan
    Vlckova, Katarina
    [J]. MATHEMATICAL METHODS IN ECONOMICS (MME 2014), 2014, : 378 - 383
  • [30] High-dimensional Data Classification Based on Principal Component Analysis Dimension Reduction and Improved BP Algorithm
    Yan, Tai-shan
    Wen, Yi-ting
    Li, Wen-bin
    [J]. 2018 INTERNATIONAL CONFERENCE ON COMMUNICATION, NETWORK AND ARTIFICIAL INTELLIGENCE (CNAI 2018), 2018, : 441 - 445