Feature-specific inference for penalized regression using local false discovery rates

被引:1
|
作者
Miller, Ryan [1 ]
Breheny, Patrick [2 ]
机构
[1] Grinnell Coll, Dept Math, Grinnell, IA 50112 USA
[2] Univ Iowa, Dept Biostat, Iowa City, IA USA
关键词
false discovery rates; high-dimensional data; high-dimensional models; lasso; penalized regression; CONFIDENCE-INTERVALS; P-VALUES; SELECTION;
D O I
10.1002/sim.9678
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Penalized regression methods such as the lasso are a popular approach to analyzing high-dimensional data. One attractive property of the lasso is that it naturally performs variable selection. An important area of concern, however, is the reliability of these selections. Motivated by local false discovery rate methodology from the large-scale hypothesis testing literature, we propose a method for calculating a local false discovery rate for each variable under consideration by the lasso model. These rates can be used to assess the reliability of an individual feature, or to estimate the model's overall false discovery rate. The method can be used for any level of regularization. This is particularly useful for models with a few highly significant features but a high overall false discovery rate, a relatively common occurrence when using cross validation to select a model. It is also flexible enough to be applied to many varieties of penalized likelihoods including generalized linear models and Cox regression, and a variety of penalties, including the minimax concave penalty (MCP) and smoothly clipped absolute deviation (SCAD) penalty. We demonstrate the validity of this approach and contrast it with other inferential methods for penalized regression as well as with local false discovery rates for univariate hypothesis tests. Finally, we show the practical utility of our method by applying it to a case study involving gene expression in breast cancer patients.
引用
收藏
页码:1412 / 1429
页数:18
相关论文
共 50 条
  • [31] Feature-specific nutrient management of onion (Allium cepa) using machine learning and compositional methods
    Hahn, Leandro
    Kurtz, Claudinei
    de Paula, Betania Vahl
    Feltrim, Anderson Luiz
    Higashikawa, Fabio Satoshi
    Moreira, Camila
    Rozane, Danilo Eduardo
    Brunetto, Gustavo
    Parent, Leon-Etienne
    SCIENTIFIC REPORTS, 2024, 14 (01)
  • [32] Local feature selection using Gaussian process regression
    Pichara, Karim
    Soto, Alvaro
    INTELLIGENT DATA ANALYSIS, 2014, 18 (03) : 319 - 336
  • [33] M-regression, false discovery rates and outlier detection with application to genetic association studies
    Lourenco, V. M.
    Pires, A. M.
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2014, 78 : 33 - 42
  • [34] Deep neural network-based feature selection with local false discovery rate estimation
    Cao, Zixuan
    Sun, Xiaoya
    Fu, Yan
    APPLIED INTELLIGENCE, 2025, 55 (01)
  • [35] False discovery rate revisited: FDR and topological inference using Gaussian random fields
    Chumbley, Justin R.
    Friston, Karl J.
    NEUROIMAGE, 2009, 44 (01) : 62 - 70
  • [36] Maximizing Interpretability and Cost-Effectiveness of Surgical Site Infection (SSI) Predictive Models Using Feature-Specific Regularized Logistic Regression on Preoperative Temporal Data
    Kocbek, Primoz
    Fijacko, Nino
    Soguero-Ruiz, Cristina
    Mikalsen, Karl Oyvind
    Maver, Uros
    Brzan, Petra Povalej
    Stozer, Andraz
    Jenssen, Robert
    Skrovseth, Stein Olav
    Stiglic, Gregor
    COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE, 2019, 2019
  • [37] Nonlinear fitting method for determining local false discovery rates from decoy database searches
    Tang, Wilfred H.
    Shilov, Ignat V.
    Seymour, Sean L.
    JOURNAL OF PROTEOME RESEARCH, 2008, 7 (09) : 3661 - 3667
  • [38] Estimating false discovery rates for peptide and protein identification using randomized databases
    Hather, Gregory
    Higdon, Roger
    Bauman, Andrew
    von Haller, Priska D.
    Kolker, Eugene
    PROTEOMICS, 2010, 10 (12) : 2369 - 2376
  • [39] Peptide identifications and false discovery rates using different mass spectrometry platforms
    Anapindi, Krishna D. B.
    Romanova, Elena V.
    Southey, Bruce R.
    Sweedler, Jonathan V.
    TALANTA, 2018, 182 : 456 - 463
  • [40] fdrtool: a versatile R package for estimating local and tail area-based false discovery rates
    Strimmer, Korbinian
    BIOINFORMATICS, 2008, 24 (12) : 1461 - 1462