PCA Consistency for Non-Gaussian Data in High Dimension, Low Sample Size Context

被引:33
|
作者
Yata, Kazuyoshi [2 ]
Aoshima, Makoto [1 ]
机构
[1] Univ Tsukuba, Inst Math, Tsukuba 3058571, Japan
[2] Univ Tsukuba, Grad Sch Pure & Appl Sci, Tsukuba 3058571, Japan
基金
日本学术振兴会;
关键词
Consistency; Dual covariance matrix; Eigenvalue distribution; HDLSS; Large p small n; Principal component analysis; Random matrix theory; Sample size; GEOMETRIC REPRESENTATION; COVARIANCE MATRICES; LARGEST EIGENVALUE;
D O I
10.1080/03610910902936083
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
In this article, we investigate both sample eigenvalues and Principal Component (PC) directions along with PC scores when the dimension d and the sample size n both grow to infinity in such a way that n is much lower than d. We consider general settings that include the case when the eigenvalues are all in the range of sphericity. We do not assume either the normality or a -mixing condition. We attempt finding a difference among the eigenvalues by choosing n with a suitable order of d. We give the consistency properties for both the sample eigenvalues and the PC directions along with the PC scores. We also show that the sample eigenvalue has a Gaussian limiting distribution when the population counterpart is of multiplicity one.
引用
收藏
页码:2634 / 2652
页数:19
相关论文
共 50 条
  • [21] CLUSTERING HIGH DIMENSION, LOW SAMPLE SIZE DATA USING THE MAXIMAL DATA PILING DISTANCE
    Ahn, Jeongyoun
    Lee, Myung Hee
    Yoon, Young Joo
    [J]. STATISTICA SINICA, 2012, 22 (02) : 443 - 464
  • [22] A dimension reduction technique applied to regression on high dimension, low sample size neurophysiological data sets
    Adrielle C. Santana
    Adriano V. Barbosa
    Hani C. Yehia
    Rafael Laboissière
    [J]. BMC Neuroscience, 22
  • [23] A dimension reduction technique applied to regression on high dimension, low sample size neurophysiological data sets
    Santana, Adrielle C.
    Barbosa, Adriano V.
    Yehia, Hani C.
    Laboissiere, Rafael
    [J]. BMC NEUROSCIENCE, 2021, 22 (01)
  • [24] A survey of high dimension low sample size asymptotics
    Aoshima, Makoto
    Shen, Dan
    Shen, Haipeng
    Yata, Kazuyoshi
    Zhou, Yi-Hui
    Marron, J. S.
    [J]. AUSTRALIAN & NEW ZEALAND JOURNAL OF STATISTICS, 2018, 60 (01) : 4 - 19
  • [25] Consistency relations for sharp inflationary non-Gaussian features
    Mooij, Sander
    Palma, Gonzalo A.
    Panotopoulos, Grigoris
    Soto, Alex
    [J]. JOURNAL OF COSMOLOGY AND ASTROPARTICLE PHYSICS, 2016, (09):
  • [26] Distance-based outlier detection for high dimension, low sample size data
    Ahn, Jeongyoun
    Lee, Myung Hee
    Lee, Jung Ae
    [J]. JOURNAL OF APPLIED STATISTICS, 2019, 46 (01) : 13 - 29
  • [27] Biobjective gradient descent for feature selection on high dimension, low sample size data
    Issa, Tina
    Angel, Eric
    Zehraoui, Farida
    [J]. PLOS ONE, 2024, 19 (07):
  • [28] Multiclass Classification on High Dimension and Low Sample Size Data Using Genetic Programming
    Wei, Tingyang
    Liu, Wei-Li
    Zhong, Jinghui
    Gong, Yue-Jiao
    [J]. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, 2022, 10 (02) : 704 - 718
  • [29] Discriminating Tensor Spectral Clustering for High-Dimension-Low-Sample-Size Data
    Hu, Yu
    Qi, Fei
    Cheung, Yiu-Ming
    Cai, Hongmin
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,
  • [30] Statistical Significance of Clustering for High-Dimension, Low-Sample Size Data
    Liu, Yufeng
    Hayes, David Neil
    Nobel, Andrew
    Marron, J. S.
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2008, 103 (483) : 1281 - 1293