Exploring dimension learning via a penalized probabilistic principal component analysis

被引:3
|
作者
Deng, Wei Q. [1 ,2 ]
Craiu, Radu, V [3 ]
机构
[1] McMaster Univ, Dept Psychiat & Behav Neurosci, Hamilton, ON, Canada
[2] St Josephs Healthcare Hamilton, Peter Boris Ctr Addict Res, Hamilton, ON, Canada
[3] Univ Toronto, Dept Stat Sci, Toronto, ON, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Dimension estimation; model selection; penalization; principal component analysis; probabilistic principal component analysis; profile likelihood; SELECTION; COVARIANCE; NUMBER; EIGENVALUES; SHRINKAGE; TESTS;
D O I
10.1080/00949655.2022.2100890
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Establishing a low-dimensional representation of the data leads to efficient data learning strategies. In many cases, the reduced dimension needs to be explicitly stated and estimated from the data. We explore the estimation of dimension in finite samples as a constrained optimization problem, where the estimated dimension is a maximizer of a penalized profile likelihood criterion within the framework of a probabilistic principal components analysis. Unlike other penalized maximization problems that require an 'optimal' penalty tuning parameter, we propose a data-averaging procedure whereby the estimated dimension emerges as the most favourable choice over a range of plausible penalty parameters. The proposed heuristic is compared to a large number of alternative criteria in simulations and an application to gene expression data. Extensive simulation studies reveal that none of the methods uniformly dominate the other and highlight the importance of subject-specific knowledge in choosing statistical methods for dimension learning. Our application results also suggest that gene expression data have a higher intrinsic dimension than previously thought. Overall, our proposed heuristic strikes a good balance and is the method of choice when model assumptions deviated moderately.
引用
收藏
页码:266 / 297
页数:32
相关论文
共 50 条
  • [21] PROVABLE DIMENSION DETECTION USING PRINCIPAL COMPONENT ANALYSIS
    Cheng, Siu-Wing
    Wang, Yajun
    Wu, Zhaungzhi
    INTERNATIONAL JOURNAL OF COMPUTATIONAL GEOMETRY & APPLICATIONS, 2008, 18 (05) : 415 - 440
  • [22] Principal component analysis to reduce dimension on digital image
    Ng, S. C.
    8TH INTERNATIONAL CONFERENCE ON ADVANCES IN INFORMATION TECHNOLOGY, 2017, 111 : 113 - 119
  • [23] OPTIMAL DIMENSION OF GRAPHICAL DISPLAYS FOR PRINCIPAL COMPONENT ANALYSIS
    FERRE, L
    COMPTES RENDUS DE L ACADEMIE DES SCIENCES SERIE I-MATHEMATIQUE, 1989, 309 (18): : 959 - 964
  • [24] Effect of dimension reduction by principal component analysis on clustering
    Erisoglu, Murat
    Erisoglu, Ulku
    JOURNAL OF STATISTICS AND MANAGEMENT SYSTEMS, 2011, 14 (02) : 277 - 287
  • [25] Deep Probabilistic Principal Component Analysis for Process Monitoring
    Kong, Xiangyin
    He, Yimeng
    Song, Zhihuan
    Liu, Tong
    Ge, Zhiqiang
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, : 1 - 15
  • [26] Dynamic kernel probabilistic principal component analysis model
    Institute of Automation, Jiangnan University, Wuxi 214122, China
    Qinghua Daxue Xuebao/Journal of Tsinghua University, 2008, 48 (SUPPL.): : 1824 - 1828
  • [27] Competitive probabilistic principal component analysis neural networks
    Lopez-Rubio, E
    Ottiz-de-Lazcano-Lobato, JM
    PROCEEDINGS OF THE EIGHTH IASTED INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, 2004, : 380 - 385
  • [28] Probabilistic two-dimensional principal component analysis
    College of Information Science and Technology, East China University of Science and Technology, Shanghai 200237, China
    Zidonghua Xuebao, 2008, 3 (353-359):
  • [29] Generalized probabilistic principal component analysis of correlated data
    Gu, Mengyang
    Shen, Weining
    JOURNAL OF MACHINE LEARNING RESEARCH, 2020, 21
  • [30] Probabilistic kernel principal component analysis through time
    Alvarez, Mauricio
    Henao, Ricardo
    NEURAL INFORMATION PROCESSING, PT 1, PROCEEDINGS, 2006, 4232 : 747 - 754