Identification of influential rare variants in aggregate testing using random forest importance measures

被引:2
|
作者
Blumhagen, Rachel Z. [1 ,2 ]
Schwartz, David A. [3 ]
Langefeld, Carl D. [4 ,5 ,6 ]
Fingerlin, Tasha E. [1 ,2 ,3 ]
机构
[1] Natl Jewish Hlth, Ctr Genes Environm & Hlth, Denver, CO 80206 USA
[2] Colorado Sch Publ Hlth, Dept Biostat & Informat, Aurora, CO USA
[3] Univ Colorado, Sch Med, Aurora, CO USA
[4] Wake Forest Sch Med, Dept Biostat & Data Sci, Winston Salem, NC USA
[5] Wake Forest Baptist Med Ctr, Comprehens Canc Ctr, Winston Salem, NC USA
[6] Wake Forest Sch Med, Ctr Precis Med, Winston Salem, NC USA
关键词
genetic association; idiopathic pulmonary fibrosis; random forest; rare variants; targeted sequencing; TERT PROMOTER MUTATIONS;
D O I
10.1111/ahg.12509
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Aggregate tests of rare variants are often employed to identify associated regions compared to sequentially testing each individual variant. When an aggregate test is significant, it is of interest to identify which rare variants are "driving" the association. We recently developed the rare variant influential filtering tool (RIFT) to identify influential rare variants and showed RIFT had higher true positive rates compared to other published methods. Here we use importance measures from the standard random forest (RF) and variable importance weighted RF (vi-RF) to identify influential variants. For very rare variants (minor allele frequency [MAF] < 0.001), the vi-RF:Accuracy method had the highest median true positive rate (TPR = 0.24; interquartile range [IQR]: 0.13, 0.42) followed by the RF:Accuracy method (TPR = 0.16; IQR: 0.07, 0.33) and both were superior to RIFT (TPR = 0.05; IQR: 0.02, 0.15). Among uncommon variants (0.001 < MAF < 0.03), the RF methods had higher true positive rates than RIFT while observing comparable false positive rates. Finally, we applied the RF methods to a targeted resequencing study in idiopathic pulmonary fibrosis (IPF), in which the vi-RF approach identified eight and seven variants in TERT and FAM13A, respectively. In summary, the vi-RF provides an improved, objective approach to identifying influential variants following a significant aggregate test. We have expanded our previously developed R package RIFT to include the random forest methods.
引用
收藏
页码:184 / 195
页数:12
相关论文
共 50 条
  • [41] Critique of operating variables importance on chiller energy performance using random forest
    Yu, F. W.
    Ho, W. T.
    Chan, K. T.
    Sit, R. K. Y.
    ENERGY AND BUILDINGS, 2017, 139 : 653 - 664
  • [42] NPP estimation using random forest and impact feature variable importance analysis
    Yu, Bo
    Chen, Fang
    Chen, Hanyue
    JOURNAL OF SPATIAL SCIENCE, 2019, 64 (01) : 173 - 192
  • [43] Using a Random Forest proximity measure for variable importance stratification in genotypic data
    Seoane, Jose A.
    Day, Ian N. M.
    Campbell, Colin
    Casas, Juan P.
    Gaunt, Tom R.
    PROCEEDINGS IWBBIO 2014: INTERNATIONAL WORK-CONFERENCE ON BIOINFORMATICS AND BIOMEDICAL ENGINEERING, VOLS 1 AND 2, 2014, : 1049 - 1060
  • [44] Improved phonotactic language identification using random forest language models
    Wang, XiaoRui
    Wang, ShiJin
    Liang, JiaEn
    Xu, Bo
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4237 - 4240
  • [45] Random forest based-biometric identification using smart shoes
    Kim, JeongKyun
    Lee, Kang Bok
    Hong, Sang Gi
    2017 ELEVENTH INTERNATIONAL CONFERENCE ON SENSING TECHNOLOGY (ICST), 2017, : 216 - 219
  • [46] Identification of Power Transformer Currents by Using Random Forest and Boosting Techniques
    Khatib, Tamer
    Arar, Gazi
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2020, 2020 (2020)
  • [47] Joint Association Testing of Common and Rare Genetic Variants Using Hierarchical Modeling
    Cardin, Niall J.
    Mefford, Joel A.
    Witte, John S.
    GENETIC EPIDEMIOLOGY, 2012, 36 (06) : 642 - 651
  • [48] A sequence-based method to predict the impact of regulatory variants using random forest
    Liu, Qiao
    Gan, Mingxin
    Jiang, Rui
    BMC SYSTEMS BIOLOGY, 2017, 11
  • [49] Dissolved oxygen prediction model based on variable importance measures and random forest: A case study of Shenzhen Bay
    Yang, Ming-Yue
    Mao, Xian-Zhong
    Zhongguo Huanjing Kexue/China Environmental Science, 2022, 42 (08): : 3876 - 3881
  • [50] Interpreting the influential factors in ship detention using a novel random forest algorithm considering dataset imbalance and uncertainty
    Xiao, Yi
    Jin, Mengjie
    Qi, Guanqiu
    Shi, Wenming
    Li, Kevin X.
    Du, Xianping
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 133