Robust PCA for high-dimensional data based on characteristic transformation

被引:1
|
作者
He, Lingyu [1 ]
Yang, Yanrong [2 ]
Zhang, Bo [3 ]
机构
[1] Hunan Univ, Changsha, Peoples R China
[2] Australian Natl Univ, Canberra, Australia
[3] Univ Sci & Technol China, Int Inst Finance Sch Management, Dept Stat & Finance, Hefei 230026, Peoples R China
基金
国家重点研发计划;
关键词
characteristic function; heavy-tailed data; high-dimensional data; kernel PCA; robust PCA; spiked covariance model; PRINCIPAL COMPONENT ANALYSIS; EIGENVALUES; COVARIANCE;
D O I
10.1111/anzs.12385
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
In this paper, we propose a novel robust principal component analysis (PCA) for high-dimensional data in the presence of various heterogeneities, in particular strong tailing and outliers. A transformation motivated by the characteristic function is constructed to improve the robustness of the classical PCA. The suggested method has the distinct advantage of dealing with heavy-tail-distributed data, whose covariances may be non-existent (positively infinite, for instance), in addition to the usual outliers. The proposed approach is also a case of kernel principal component analysis (KPCA) and employs the robust and non-linear properties via a bounded and non-linear kernel function. The merits of the new method are illustrated by some statistical properties, including the upper bound of the excess error and the behaviour of the large eigenvalues under a spiked covariance model. Additionally, using a variety of simulations, we demonstrate the benefits of our approach over the classical PCA. Finally, using data on protein expression in mice of various genotypes in a biological study, we apply the novel robust PCA to categorise the mice and find that our approach is more effective at identifying abnormal mice than the classical PCA.
引用
收藏
页码:127 / 151
页数:25
相关论文
共 50 条
  • [1] Robust PCA for high-dimensional data
    Hubert, M
    Rousseeuw, PJ
    Verboven, S
    [J]. DEVELOPMENTS IN ROBUST STATISTICS, 2003, : 169 - 179
  • [2] Improved Algorithms for High-dimensional Robust PCA
    Lin, Xiaoyong
    Zhang, Zeqiu
    Wang, Jue
    Zhang, Zhaoyang
    Qiu, Tingting
    Mi, Zhengkun
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON UBIQUITOUS WIRELESS BROADBAND (ICUWB2016), 2016,
  • [3] Outlier-Robust PCA: The High-Dimensional Case
    Xu, Huan
    Caramanis, Constantine
    Mannor, Shie
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 2013, 59 (01) : 546 - 572
  • [4] PCA learning for sparse high-dimensional data
    Hoyle, DC
    Rattray, M
    [J]. EUROPHYSICS LETTERS, 2003, 62 (01): : 117 - 123
  • [5] Sparse PCA for High-Dimensional Data With Outliers
    Hubert, Mia
    Reynkens, Tom
    Schmitt, Eric
    Verdonck, Tim
    [J]. TECHNOMETRICS, 2016, 58 (04) : 424 - 434
  • [6] Cluster PCA for outliers detection in high-dimensional data
    Stefatos, George
    Ben Hamza, A.
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS, VOLS 1-8, 2007, : 3961 - 3966
  • [7] Optimally Weighted PCA for High-Dimensional Heteroscedastic Data
    Hong, David
    Yang, Fan
    Fessler, Jeffrey A.
    Balzano, Laura
    [J]. SIAM JOURNAL ON MATHEMATICS OF DATA SCIENCE, 2023, 5 (01): : 222 - 250
  • [8] Asymptotic performance of PCA for high-dimensional heteroscedastic data
    Hong, David
    Balzano, Laura
    Fessler, Jeffrey A.
    [J]. JOURNAL OF MULTIVARIATE ANALYSIS, 2018, 167 : 435 - 452
  • [9] Visualization Study of High-dimensional Data Classification Based on PCA-SVM
    Zhao Zhongwen
    Guo Huanghuang
    [J]. 2017 IEEE SECOND INTERNATIONAL CONFERENCE ON DATA SCIENCE IN CYBERSPACE (DSC), 2017, : 346 - 349
  • [10] Fast Robust Correlation for High-Dimensional Data
    Raymaekers, Jakob
    Rousseeuw, Peter J.
    [J]. TECHNOMETRICS, 2021, 63 (02) : 184 - 198