Stable Variable Selection for High-Dimensional Genomic Data with Strong Correlations

被引:0
|
作者
Sarkar R. [1 ]
Manage S. [2 ]
Gao X. [3 ]
机构
[1] Department of Mathematics and Statistics, University of North Carolina at Greensboro, 116 Petty Building, PO Box 26170, Greensboro, 27402, NC
[2] Department of Mathematics, Texas A&M University, Blocker Building, 3368 TAMU, 155 Ireland Street, College Station, 77840, TX
[3] Meta Platforms, Menlo Park, CA
基金
美国国家科学基金会;
关键词
Bi-level sparsity; Minimax concave penalty; Stability; Strong correlation; Variable selection;
D O I
10.1007/s40745-023-00481-5
中图分类号
学科分类号
摘要
High-dimensional genomic data studies are often found to exhibit strong correlations, which results in instability and inconsistency in the estimates obtained using commonly used regularization approaches including the Lasso and MCP, etc. In this paper, we perform comparative study of regularization approaches for variable selection under different correlation structures and propose a two-stage procedure named rPGBS to address the issue of stable variable selection in various strong correlation settings. This approach involves repeatedly running a two-stage hierarchical approach consisting of a random pseudo-group clustering and bi-level variable selection. Extensive simulation studies and high-dimensional genomic data analysis on real datasets have demonstrated the advantage of the proposed rPGBS method over some of the most used regularization methods. In particular, rPGBS results in more stable selection of variables across a variety of correlation settings, as compared to some recent methods addressing variable selection with strong correlations: Precision Lasso (Wang et al. in Bioinformatics 35:1181–1187, 2019) and Whitening Lasso (Zhu et al. in Bioinformatics 37:2238–2244, 2021). Moreover, rPGBS has been shown to be computationally efficient across various settings. © 2023, The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.
引用
收藏
页码:1139 / 1164
页数:25
相关论文
共 50 条
  • [1] A variable selection approach for highly correlated predictors in high-dimensional genomic data
    Zhu, Wencan
    Levy-Leduc, Celine
    Ternes, Nils
    BIOINFORMATICS, 2021, 37 (16) : 2238 - 2244
  • [2] Variable selection for high-dimensional incomplete data
    Liang, Lixing
    Zhuang, Yipeng
    Yu, Philip L. H.
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2024, 192
  • [3] High-Dimensional Variable Selection for Survival Data
    Ishwaran, Hemant
    Kogalur, Udaya B.
    Gorodeski, Eiran Z.
    Minn, Andy J.
    Lauer, Michael S.
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2010, 105 (489) : 205 - 217
  • [4] Bayesian variable selection in clustering high-dimensional data
    Tadesse, MG
    Sha, N
    Vannucci, M
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2005, 100 (470) : 602 - 617
  • [5] VARIABLE SELECTION AND PREDICTION WITH INCOMPLETE HIGH-DIMENSIONAL DATA
    Liu, Ying
    Wang, Yuanjia
    Feng, Yang
    Wall, Melanie M.
    ANNALS OF APPLIED STATISTICS, 2016, 10 (01): : 418 - 450
  • [6] Bayesian variable selection for high-dimensional rank data
    Cui, Can
    Singh, Susheela P.
    Staicu, Ana-Maria
    Reich, Brian J.
    ENVIRONMETRICS, 2021, 32 (07)
  • [7] A Variable Selection Method for High-Dimensional Survival Data
    Giordano, Francesco
    Milito, Sara
    Restaino, Marialuisa
    MATHEMATICAL AND STATISTICAL METHODS FOR ACTUARIAL SCIENCES AND FINANCE, MAF 2022, 2022, : 303 - 308
  • [8] HIGH-DIMENSIONAL VARIABLE SELECTION
    Wasserman, Larry
    Roeder, Kathryn
    ANNALS OF STATISTICS, 2009, 37 (5A): : 2178 - 2201
  • [9] Variable selection for high-dimensional genomic data with censored outcomes using group lasso prior
    Lee, Kyu Ha
    Chakraborty, Sounak
    Sun, Jianguo
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2017, 112 : 1 - 13
  • [10] Exploiting the ensemble paradigm for stable feature selection: A case study on high-dimensional genomic data
    Pes, Barbara
    Dessi, Nicoletta
    Angioni, Marta
    INFORMATION FUSION, 2017, 35 : 132 - 147