Transformed low-rank ANOVA models for high-dimensional variable selection

被引:9
|
作者
Jung, Yoonsuh [1 ]
Zhang, Hong [2 ]
Hu, Jianhua [3 ]
机构
[1] Korea Univ, Dept Stat, Seoul, South Korea
[2] Fudan Univ, Inst Biostat, Shanghai, Peoples R China
[3] Univ Texas MD Anderson Canc Ctr, Dept Biostat, Houston, TX 77030 USA
基金
美国国家卫生研究院; 新加坡国家研究基金会;
关键词
ANOVA; BIC; diverging number of parameters; high-dimensional variables; low rank; variable selection; CERVICAL-CANCER SUSCEPTIBILITY; REGRESSION; LASSO; GENE; CLASSIFICATION; POLYMORPHISM; ASSOCIATION; XRCC1; REGULARIZATION; LIKELIHOOD;
D O I
10.1177/0962280217753726
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
High-dimensional data are often encountered in biomedical, environmental, and other studies. For example, in biomedical studies that involve high-throughput omic data, an important problem is to search for genetic variables that are predictive of a particular phenotype. A conventional solution is to characterize such relationships through regression models in which a phenotype is treated as the response variable and the variables are treated as covariates; this approach becomes particularly challenging when the number of variables exceeds the number of samples. We propose a general framework for expressing the transformed mean of high-dimensional variables in an exponential distribution family via ANOVA models in which a low-rank interaction space captures the association between the phenotype and the variables. This alternative method transforms the variable selection problem into a well-posed problem with the number of observations larger than the number of variables. In addition, we propose a model selection criterion for the new model framework with a diverging number of parameters, and establish the consistency of the selection criterion. We demonstrate the appealing performance of the proposed method in terms of prediction and detection accuracy through simulations and real data analyses.
引用
收藏
页码:1230 / 1246
页数:17
相关论文
共 50 条
  • [31] Combining Factor Models and Variable Selection in High-Dimensional Regression
    Kneip, Alois
    Sarda, Pascal
    RECENT ADVANCES IN FUNCTIONAL DATA ANALYSIS AND RELATED TOPICS, 2011, : 197 - 202
  • [32] Variable selection in high-dimensional double generalized linear models
    Xu, Dengke
    Zhang, Zhongzhan
    Wu, Liucang
    STATISTICAL PAPERS, 2014, 55 (02) : 327 - 347
  • [33] Variable selection in high-dimensional partly linear additive models
    Lian, Heng
    JOURNAL OF NONPARAMETRIC STATISTICS, 2012, 24 (04) : 825 - 839
  • [34] Estimation and variable selection for high-dimensional spatial data models
    Hou, Li
    Jin, Baisuo
    Wu, Yuehua
    JOURNAL OF ECONOMETRICS, 2024, 238 (02)
  • [35] Variable selection in high-dimensional double generalized linear models
    Dengke Xu
    Zhongzhan Zhang
    Liucang Wu
    Statistical Papers, 2014, 55 : 327 - 347
  • [36] High-dimensional Face data Separation for Recognition via Low-Rank Constraints
    Guo, Tan
    Tan, Xiaoheng
    PROCEEDINGS OF THE 2016 IEEE REGION 10 CONFERENCE (TENCON), 2016, : 3144 - 3147
  • [37] High-dimensional covariance matrix estimation using a low-rank and diagonal decomposition
    Wu, Yilei
    Qin, Yingli
    Zhu, Mu
    CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2020, 48 (02): : 308 - 337
  • [38] Preconditioned low-rank methods for high-dimensional elliptic PDE eigenvalue problems
    Kressner D.
    Tobler C.
    Computational Methods in Applied Mathematics, 2011, 11 (03) : 363 - 381
  • [39] On low-rank approximability of solutions to high-dimensional operator equations and eigenvalue problems
    Kressner, Daniel
    Uschmajew, Andre
    LINEAR ALGEBRA AND ITS APPLICATIONS, 2016, 493 : 556 - 572
  • [40] Variable transformations in combination with wavelets and ANOVA for high-dimensional approximation
    Potts, Daniel
    Weidensager, Laura
    ADVANCES IN COMPUTATIONAL MATHEMATICS, 2024, 50 (03)