Stable Variable Selection for High-Dimensional Genomic Data with Strong Correlations

被引:0
|
作者
Sarkar R. [1 ]
Manage S. [2 ]
Gao X. [3 ]
机构
[1] Department of Mathematics and Statistics, University of North Carolina at Greensboro, 116 Petty Building, PO Box 26170, Greensboro, 27402, NC
[2] Department of Mathematics, Texas A&M University, Blocker Building, 3368 TAMU, 155 Ireland Street, College Station, 77840, TX
[3] Meta Platforms, Menlo Park, CA
基金
美国国家科学基金会;
关键词
Bi-level sparsity; Minimax concave penalty; Stability; Strong correlation; Variable selection;
D O I
10.1007/s40745-023-00481-5
中图分类号
学科分类号
摘要
High-dimensional genomic data studies are often found to exhibit strong correlations, which results in instability and inconsistency in the estimates obtained using commonly used regularization approaches including the Lasso and MCP, etc. In this paper, we perform comparative study of regularization approaches for variable selection under different correlation structures and propose a two-stage procedure named rPGBS to address the issue of stable variable selection in various strong correlation settings. This approach involves repeatedly running a two-stage hierarchical approach consisting of a random pseudo-group clustering and bi-level variable selection. Extensive simulation studies and high-dimensional genomic data analysis on real datasets have demonstrated the advantage of the proposed rPGBS method over some of the most used regularization methods. In particular, rPGBS results in more stable selection of variables across a variety of correlation settings, as compared to some recent methods addressing variable selection with strong correlations: Precision Lasso (Wang et al. in Bioinformatics 35:1181–1187, 2019) and Whitening Lasso (Zhu et al. in Bioinformatics 37:2238–2244, 2021). Moreover, rPGBS has been shown to be computationally efficient across various settings. © 2023, The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.
引用
收藏
页码:1139 / 1164
页数:25
相关论文
共 50 条
  • [41] Variable selection for high-dimensional incomplete data using horseshoe estimation with data augmentation
    Zhang, Yunxi
    Kim, Soeun
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2024, 53 (12) : 4235 - 4251
  • [42] Model Selection for High-Dimensional Data
    Owrang, Arash
    Jansson, Magnus
    2016 50TH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS, 2016, : 606 - 609
  • [43] Feature selection for high-dimensional data
    Bolón-Canedo V.
    Sánchez-Maroño N.
    Alonso-Betanzos A.
    Progress in Artificial Intelligence, 2016, 5 (2) : 65 - 75
  • [44] Feature selection for high-dimensional data
    Destrero A.
    Mosci S.
    De Mol C.
    Verri A.
    Odone F.
    Computational Management Science, 2009, 6 (1) : 25 - 40
  • [45] Bayesian variable selection with sparse and correlation priors for high-dimensional data analysis
    Yang, Aijun
    Jiang, Xuejun
    Shu, Lianjie
    Lin, Jinguan
    COMPUTATIONAL STATISTICS, 2017, 32 (01) : 127 - 143
  • [46] Controlled variable selection in Weibull mixture cure models for high-dimensional data
    Fu, Han
    Nicolet, Deedra
    Mrozek, Krzysztof
    Stone, Richard M.
    Eisfeld, Ann-Kathrin
    Byrd, John C.
    Archer, Kellie J.
    STATISTICS IN MEDICINE, 2022, 41 (22) : 4340 - 4366
  • [47] Bayesian variable selection with sparse and correlation priors for high-dimensional data analysis
    Aijun Yang
    Xuejun Jiang
    Lianjie Shu
    Jinguan Lin
    Computational Statistics, 2017, 32 : 127 - 143
  • [48] The EAS approach to variable selection for multivariate response data in high-dimensional settings
    Koner, Salil
    Williams, Jonathan P.
    ELECTRONIC JOURNAL OF STATISTICS, 2023, 17 (02): : 1947 - 1995
  • [49] Bayesian variable selection in multinomial probit model for classifying high-dimensional data
    Aijun Yang
    Yunxian Li
    Niansheng Tang
    Jinguan Lin
    Computational Statistics, 2015, 30 : 399 - 418
  • [50] Comparison of variable selection methods for high-dimensional survival data with competing events
    Gilhodes, Julia
    Zemmour, Christophe
    Ajana, Soufiane
    Martinez, Alejandra
    Delord, Jean-Pierre
    Leconte, Eve
    Boher, Jean-Marie
    Filleron, Thomas
    COMPUTERS IN BIOLOGY AND MEDICINE, 2017, 91 : 159 - 167