Principal component analysis for sparse high-dimensional data

被引:0
|
作者
Raiko, Tapani [1 ]
Ilin, Alexander [1 ]
Karhunen, Juha [1 ]
机构
[1] Aalto Univ, Adapt Informat Res Ctr, FI-02015 Helsinki, Finland
来源
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Principal component analysis (PCA) is a widely used technique for data analysis and dimensionality reduction. Eigenvalue decomposition is the standard algorithm for solving PCA, but a number of other algorithms have been proposed. For instance, the EM algorithm is much more efficient in case of high dimensionality and a small number of principal components. We study a case where the data are high-dimensional and a majority of the values are missing. In this case, both of these algorithms turn out to be inadequate. We propose using a gradient descent algorithm inspired by Oja's rule, and speeding it up by an approximate Newton's method. The computational complexity of the proposed method is linear with respect to the number of observed values in the data and to the number of principal components. In the experiments with Netflix data, the proposed algorithm is about ten times faster than any of the four comparison methods.
引用
收藏
页码:566 / 575
页数:10
相关论文
共 50 条
  • [1] Exploring high-dimensional biological data with sparse contrastive principal component analysis
    Boileau, Philippe
    Hejazi, Nima S.
    Dudoit, Sandrine
    [J]. BIOINFORMATICS, 2020, 36 (11) : 3422 - 3430
  • [2] Sparse principal component based high-dimensional mediation analysis
    Zhao, Yi
    Lindquist, Martin A.
    Caffo, Brian S.
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2020, 142
  • [3] Evaluating the performance of sparse principal component analysis methods in high-dimensional data scenarios
    Bonner, Ashley J.
    Beyene, Joseph
    [J]. COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2017, 46 (05) : 3794 - 3811
  • [4] Sparse principal component analysis for high-dimensional stationary time series
    Fujimori, Kou
    Goto, Yuichi
    Liu, Yan
    Taniguchi, Masanobu
    [J]. SCANDINAVIAN JOURNAL OF STATISTICS, 2023, 50 (04) : 1953 - 1983
  • [5] Multilevel Functional Principal Component Analysis for High-Dimensional Data
    Zipunnikov, Vadim
    Caffo, Brian
    Yousem, David M.
    Davatzikos, Christos
    Schwartz, Brian S.
    Crainiceanu, Ciprian
    [J]. JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2011, 20 (04) : 852 - 873
  • [6] Sparse common component analysis for multiple high-dimensional datasets via noncentered principal component analysis
    Heewon Park
    Sadanori Konishi
    [J]. Statistical Papers, 2020, 61 : 2283 - 2311
  • [7] Sparse common component analysis for multiple high-dimensional datasets via noncentered principal component analysis
    Park, Heewon
    Konishi, Sadanori
    [J]. STATISTICAL PAPERS, 2020, 61 (06) : 2283 - 2311
  • [8] On principal component analysis for high-dimensional XCSR
    Behdad, Mohammad
    French, Tim
    Barone, Luigi
    Bennamoun, Mohammed
    [J]. EVOLUTIONARY INTELLIGENCE, 2012, 5 (02) : 129 - 138
  • [9] On principal component analysis for high-dimensional XCSR
    Mohammad Behdad
    Tim French
    Luigi Barone
    Mohammed Bennamoun
    [J]. Evolutionary Intelligence, 2012, 5 (2) : 129 - 138
  • [10] Cauchy robust principal component analysis with applications to high-dimensional data sets
    Fayomi, Aisha
    Pantazis, Yannis
    Tsagris, Michail
    Wood, Andrew T. A.
    [J]. STATISTICS AND COMPUTING, 2024, 34 (01)