Feature-specific inference for penalized regression using local false discovery rates

被引:1
|
作者
Miller, Ryan [1 ]
Breheny, Patrick [2 ]
机构
[1] Grinnell Coll, Dept Math, Grinnell, IA 50112 USA
[2] Univ Iowa, Dept Biostat, Iowa City, IA USA
关键词
false discovery rates; high-dimensional data; high-dimensional models; lasso; penalized regression; CONFIDENCE-INTERVALS; P-VALUES; SELECTION;
D O I
10.1002/sim.9678
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Penalized regression methods such as the lasso are a popular approach to analyzing high-dimensional data. One attractive property of the lasso is that it naturally performs variable selection. An important area of concern, however, is the reliability of these selections. Motivated by local false discovery rate methodology from the large-scale hypothesis testing literature, we propose a method for calculating a local false discovery rate for each variable under consideration by the lasso model. These rates can be used to assess the reliability of an individual feature, or to estimate the model's overall false discovery rate. The method can be used for any level of regularization. This is particularly useful for models with a few highly significant features but a high overall false discovery rate, a relatively common occurrence when using cross validation to select a model. It is also flexible enough to be applied to many varieties of penalized likelihoods including generalized linear models and Cox regression, and a variety of penalties, including the minimax concave penalty (MCP) and smoothly clipped absolute deviation (SCAD) penalty. We demonstrate the validity of this approach and contrast it with other inferential methods for penalized regression as well as with local false discovery rates for univariate hypothesis tests. Finally, we show the practical utility of our method by applying it to a case study involving gene expression in breast cancer patients.
引用
收藏
页码:1412 / 1429
页数:18
相关论文
共 50 条
  • [41] A Refined Method To Calculate False Discovery Rates for Peptide Identification Using Decoy Databases
    Navarro, Pedro
    Vazquez, Jesus
    JOURNAL OF PROTEOME RESEARCH, 2009, 8 (04) : 1792 - 1796
  • [42] A Novel Algorithm for Feature Selection Using Penalized Regression with Applications to Single-Cell RNA Sequencing Data
    Sen Puliparambil, Bhavithry
    Tomal, Jabed H.
    Yan, Yan
    BIOLOGY-BASEL, 2022, 11 (10):
  • [43] Automated feature-specific tree species identification from natural images using deep semi-supervised learning
    Homan, Dewald
    du Preez, Johan A.
    ECOLOGICAL INFORMATICS, 2021, 66
  • [44] BAYESIAN INFERENCE OF FINITE POPULATION QUANTILES FOR SKEWED SURVEY DATA USING SKEW-NORMAL PENALIZED SPLINE REGRESSION
    Liu, Yutao
    Chen, Qixuan
    JOURNAL OF SURVEY STATISTICS AND METHODOLOGY, 2020, 8 (04) : 792 - 816
  • [45] Sentiment analysis from travellers' reviews using enhanced conjunction rule based approach for feature-specific evaluation of hotels
    Maity, Aranyak
    Ghosh, Sritama
    Karfa, Saikat
    Mukhopadhyay, Moutan
    Pal, Saurabh
    Pramanik, Pijush Kanti Dutta
    JOURNAL OF STATISTICS & MANAGEMENT SYSTEMS, 2020, 23 (06): : 983 - 997
  • [46] A Feature-specific Probabilistic Assessment of Pipeline Defect Size from ILI MFL Signal Using Convolutional Neural Network
    Chen, Jenny Ling
    Westwood, Stephen
    Heaney, David
    PROCEEDINGS OF THE ASME 2020 13TH INTERNATIONAL PIPELINE CONFERENCE (IPC2020), VOL 1, 2020,
  • [47] Specific Comic Character Detection Using Local Feature Matching
    Sun, Weihan
    Burie, Jean-Christophe
    Ogier, Jean-Marc
    Kise, Koichi
    2013 12TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2013, : 275 - 279
  • [48] Inference on phenotype-specific effects of genes using multivariate kernel machine regression
    Maity, Arnab
    Zhao, Jing
    Sullivan, Patrick F.
    Tzeng, Jung-Ying
    GENETIC EPIDEMIOLOGY, 2018, 42 (01) : 64 - 79
  • [49] A selective inference approach for false discovery rate control using multiomics covariates yields insights into disease risk
    Yurko, Ronald
    G'Sell, Max
    Roeder, Kathryn
    Devlin, Bernie
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2020, 117 (26) : 15028 - 15035
  • [50] Estimating monotonic rates from biological data using local linear regression
    Olito, Colin
    White, Craig R.
    Marshall, Dustin J.
    Barneche, Diego R.
    JOURNAL OF EXPERIMENTAL BIOLOGY, 2017, 220 (05): : 759 - 764