VARIABLE SELECTION METHOD FOR THE IDENTIFICATION OF EPISTATIC MODELS

被引:0
|
作者
Holzinger, Emily Rose [1 ]
Szymczak, Silke [1 ]
Dasgupta, Abhijit [2 ]
Malley, James [3 ]
Li, Qing [1 ]
Bailey-Wilson, Joan E. [1 ]
机构
[1] NHGRI, Computat & Stat Genom Branch, NIH, Baltimore, MD 21224 USA
[2] NIAMS, Clin Trials & Outcomes Branch, NIH, Bethesda, MD 20892 USA
[3] NIH, Ctr Informat Technol, Bethesda, MD 20892 USA
关键词
D O I
暂无
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Standard analysis methods for genome wide association studies (GWAS) are not robust to complex disease models, such as interactions between variables with small main effects. These types of effects likely contribute to the heritability of complex human traits. Machine learning methods that are capable of identifying interactions, such as Random Forests (RF), are an alternative analysis approach. One caveat to RF is that there is no standardized method of selecting variables so that false positives are reduced while retaining adequate power. To this end, we have developed a novel variable selection method called relative recurrency variable importance metric (r2VIM). This method incorporates recurrency and variance estimation to assist in optimal threshold selection. For this study, we specifically address how this method performs in data with almost completely epistatic effects (i.e. no marginal effects). Our results show that with appropriate parameter settings, r2VIM can identify interaction effects when the marginal effects are virtually nonexistent It also outperf`orms logistic regression, which has essentially no power under this type of model when the number of potential features (genetic variants) is large. (All Supplementary Data can be found here: http://research.nhgri.nih.gov/manuscripts/Bailey-Wilson/r2VIM_epi/).
引用
收藏
页码:195 / 206
页数:12
相关论文
共 50 条
  • [1] Selection-mutation balance models with epistatic selection
    Kondratiev, Yu. G.
    Kuna, T.
    Ohlerich, N.
    CONDENSED MATTER PHYSICS, 2008, 11 (02) : 283 - 291
  • [2] Cytonuclear models of epistatic mating with backcrossing and selection
    Dam, EE
    Asmussen, MA
    THEORETICAL POPULATION BIOLOGY, 2005, 67 (03) : 181 - 188
  • [3] Structure identification and variable selection in geographically weighted regression models
    Wang, Wentao
    Li, Dengkui
    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2017, 87 (10) : 2050 - 2068
  • [4] Variable selection and structure identification for varying coefficient Cox models
    Honda, Toshio
    Yabe, Ryota
    JOURNAL OF MULTIVARIATE ANALYSIS, 2017, 161 : 103 - 122
  • [5] Variable selection and structure identification for additive models with longitudinal data
    Wang, Ting
    Fu, Liya
    Song, Yanan
    COMPUTATIONAL STATISTICS, 2025, 40 (02) : 951 - 975
  • [6] A method for simultaneous variable selection and outlier identification in linear regression
    Hoeting, J
    Raftery, AE
    Madigan, D
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 1996, 22 (03) : 251 - 270
  • [7] A Bayesian Variable Selection Method for Spatial Autoregressive Quantile Models
    Zhao, Yuanying
    Xu, Dengke
    MATHEMATICS, 2023, 11 (04)
  • [8] BOUNDED SUPPORT IN LINEAR RANDOM COEFFICIENT MODELS: IDENTIFICATION AND VARIABLE SELECTION
    Hermann, Philipp
    Holzmann, Hajo
    ECONOMETRIC THEORY, 2024,
  • [9] Robust variable selection and parametric component identification in varying coefficient models
    Yang, Hu
    Lv, Jing
    Guo, Chaohui
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2016, 45 (18) : 5533 - 5549
  • [10] Phantom Epistasis in Genomic Selection: On the Predictive Ability of Epistatic Models
    Schrauf, Matias F.
    Martini, Johannes W. R.
    Simianer, Henner
    de los Campos, Gustavo
    Cantet, Rodolfo
    Freudenthal, Jan
    Korte, Arthur
    Munilla, Sebastian
    G3-GENES GENOMES GENETICS, 2020, 10 (09): : 3137 - 3145