VARIABLE SELECTION METHOD FOR THE IDENTIFICATION OF EPISTATIC MODELS

被引:0
|
作者
Holzinger, Emily Rose [1 ]
Szymczak, Silke [1 ]
Dasgupta, Abhijit [2 ]
Malley, James [3 ]
Li, Qing [1 ]
Bailey-Wilson, Joan E. [1 ]
机构
[1] NHGRI, Computat & Stat Genom Branch, NIH, Baltimore, MD 21224 USA
[2] NIAMS, Clin Trials & Outcomes Branch, NIH, Bethesda, MD 20892 USA
[3] NIH, Ctr Informat Technol, Bethesda, MD 20892 USA
关键词
D O I
暂无
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Standard analysis methods for genome wide association studies (GWAS) are not robust to complex disease models, such as interactions between variables with small main effects. These types of effects likely contribute to the heritability of complex human traits. Machine learning methods that are capable of identifying interactions, such as Random Forests (RF), are an alternative analysis approach. One caveat to RF is that there is no standardized method of selecting variables so that false positives are reduced while retaining adequate power. To this end, we have developed a novel variable selection method called relative recurrency variable importance metric (r2VIM). This method incorporates recurrency and variance estimation to assist in optimal threshold selection. For this study, we specifically address how this method performs in data with almost completely epistatic effects (i.e. no marginal effects). Our results show that with appropriate parameter settings, r2VIM can identify interaction effects when the marginal effects are virtually nonexistent It also outperf`orms logistic regression, which has essentially no power under this type of model when the number of potential features (genetic variants) is large. (All Supplementary Data can be found here: http://research.nhgri.nih.gov/manuscripts/Bailey-Wilson/r2VIM_epi/).
引用
收藏
页码:195 / 206
页数:12
相关论文
共 50 条
  • [41] Using reference models in variable selection
    Pavone, Federico
    Piironen, Juho
    Burkner, Paul-Christian
    Vehtari, Aki
    COMPUTATIONAL STATISTICS, 2023, 38 (01) : 349 - 371
  • [42] Bayesian variable selection for regression models
    Kuo, L
    Mallick, B
    AMERICAN STATISTICAL ASSOCIATION - 1996 PROCEEDINGS OF THE SECTION ON BAYESIAN STATISTICAL SCIENCE, 1996, : 170 - 175
  • [43] CONSISTENT VARIABLE SELECTION IN ADDITIVE MODELS
    Xue, Lan
    STATISTICA SINICA, 2009, 19 (03) : 1281 - 1296
  • [44] Variable selection in wavelet regression models
    Alsberg, BK
    Woodward, AM
    Winson, MK
    Rowland, JJ
    Kell, DB
    ANALYTICA CHIMICA ACTA, 1998, 368 (1-2) : 29 - 44
  • [45] In reply: Variable selection and overfitted models
    Mendrala, Konrad
    Darocha, Tomasz
    Podsiadlo, Pawel
    AMERICAN JOURNAL OF EMERGENCY MEDICINE, 2024, 84 : 169 - 169
  • [46] Variable selection in measurement error models
    Ma, Yanyuan
    Li, Runze
    BERNOULLI, 2010, 16 (01) : 274 - 300
  • [47] Variable selection for spatial autoregressive models
    Xie, Li
    Wang, Xiaorui
    Cheng, Weihu
    Tang, Tian
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2021, 50 (06) : 1325 - 1340
  • [48] Variable selection in panel models with breaks
    Smith, Simon C.
    Timmermann, Allan
    Zhu, Yinchu
    JOURNAL OF ECONOMETRICS, 2019, 212 (01) : 323 - 344
  • [49] Variable selection in international diffusion models
    Gelper, Sarah
    Stremersch, Stefan
    INTERNATIONAL JOURNAL OF RESEARCH IN MARKETING, 2014, 31 (04) : 356 - 367
  • [50] VARIABLE SELECTION IN NONPARAMETRIC ADDITIVE MODELS
    Huang, Jian
    Horowitz, Joel L.
    Wei, Fengrong
    ANNALS OF STATISTICS, 2010, 38 (04): : 2282 - 2313