Exploring dimension learning via a penalized probabilistic principal component analysis

被引:3
|
作者
Deng, Wei Q. [1 ,2 ]
Craiu, Radu, V [3 ]
机构
[1] McMaster Univ, Dept Psychiat & Behav Neurosci, Hamilton, ON, Canada
[2] St Josephs Healthcare Hamilton, Peter Boris Ctr Addict Res, Hamilton, ON, Canada
[3] Univ Toronto, Dept Stat Sci, Toronto, ON, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Dimension estimation; model selection; penalization; principal component analysis; probabilistic principal component analysis; profile likelihood; SELECTION; COVARIANCE; NUMBER; EIGENVALUES; SHRINKAGE; TESTS;
D O I
10.1080/00949655.2022.2100890
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Establishing a low-dimensional representation of the data leads to efficient data learning strategies. In many cases, the reduced dimension needs to be explicitly stated and estimated from the data. We explore the estimation of dimension in finite samples as a constrained optimization problem, where the estimated dimension is a maximizer of a penalized profile likelihood criterion within the framework of a probabilistic principal components analysis. Unlike other penalized maximization problems that require an 'optimal' penalty tuning parameter, we propose a data-averaging procedure whereby the estimated dimension emerges as the most favourable choice over a range of plausible penalty parameters. The proposed heuristic is compared to a large number of alternative criteria in simulations and an application to gene expression data. Extensive simulation studies reveal that none of the methods uniformly dominate the other and highlight the importance of subject-specific knowledge in choosing statistical methods for dimension learning. Our application results also suggest that gene expression data have a higher intrinsic dimension than previously thought. Overall, our proposed heuristic strikes a good balance and is the method of choice when model assumptions deviated moderately.
引用
收藏
页码:266 / 297
页数:32
相关论文
共 50 条
  • [41] Continuous estimation of distribution algorithms with probabilistic principal component analysis
    Cho, DY
    Zhang, BT
    PROCEEDINGS OF THE 2001 CONGRESS ON EVOLUTIONARY COMPUTATION, VOLS 1 AND 2, 2001, : 521 - 526
  • [42] A Principal Component Analysis Algorithm Based on Dimension Reduction Window
    Zhang, Rui
    Du, Tao
    Qu, Shouning
    IEEE ACCESS, 2018, 6 : 63737 - 63747
  • [43] Temporally Coupled Principal Component Analysis: A Probabilistic Autoregression Method
    Christmas, Jacqueline
    Everson, Richard
    2010 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS IJCNN 2010, 2010,
  • [44] Probabilistic orthogonal-signal-corrected principal component analysis
    Lee, Geonseok
    Sim, Eunchan
    Yoon, Youngju
    Lee, Kichun
    KNOWLEDGE-BASED SYSTEMS, 2023, 268
  • [45] Visualizing probabilistic models and data with Intensive Principal Component Analysis
    Quinn, Katherine N.
    Clement, Colin B.
    De Bernardis, Francesco
    Niemack, Michael D.
    Sethna, James P.
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2019, 116 (28) : 13762 - 13767
  • [46] Rapid speaker adaptation using probabilistic principal component analysis
    Kim, DK
    Kim, NS
    IEEE SIGNAL PROCESSING LETTERS, 2001, 8 (06) : 180 - 183
  • [47] Probabilistic Principal Component Analysis Based on JoyStick Probability Selector
    Jankovic, Marko V.
    Sugiyama, Masashi
    IJCNN: 2009 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1- 6, 2009, : 811 - 818
  • [48] Rapid speaker adaptation using probabilistic principal component analysis
    Dong Kook Kim
    Nam Soo Kim
    1600, Institute of Electrical and Electronics Engineers Inc. (08):
  • [49] On the Consistency of Maximum Likelihood Estimation of Probabilistic Principal Component Analysis
    Datta, Arghya
    Chakrabarty, Sayak
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [50] A class of learning algorithms for principal component analysis and minor component analysis
    Zhang, QF
    Leung, YW
    IEEE TRANSACTIONS ON NEURAL NETWORKS, 2000, 11 (01): : 200 - 204