Iterative hard thresholding for model selection in genome-wide association studies

被引:4
|
作者
Keys, Kevin L. [1 ]
Chen, Gary K. [2 ]
Lange, Kenneth [3 ,4 ,5 ]
机构
[1] Univ Calif San Francisco, Dept Med, Box 2911, San Francisco, CA 94158 USA
[2] Univ Southern Calif, Div Biostat, Los Angeles, CA USA
[3] Univ Calif Los Angeles, Dept Biomath, Los Angeles, CA USA
[4] Univ Calif Los Angeles, Dept Human Genet, Los Angeles, CA USA
[5] Univ Calif Los Angeles, Dept Stat, Los Angeles, CA USA
基金
美国国家科学基金会;
关键词
genetic association studies; greedy algorithm; parallel computing; sparse regression; COORDINATE DESCENT ALGORITHMS; SIGNAL RECOVERY; QUANTITATIVE TRAITS; VARIABLE SELECTION; LINEAR-MODELS; GENE LEVEL; COMMON; REGRESSION; SHRINKAGE; VARIANTS;
D O I
10.1002/gepi.22068
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
A genome-wide association study (GWAS) correlates marker and trait variation in a study sample. Each subject is genotyped at a multitude of SNPs (single nucleotide polymorphisms) spanning the genome. Here, we assume that subjects are randomly collected unrelateds and that trait values are normally distributed or can be transformed to normality. Over the past decade, geneticists have been remarkably successful in applying GWAS analysis to hundreds of traits. The massive amount of data produced in these studies present unique computational challenges. Penalized regression with the (1) penalty (LASSO) or minimax concave penalty (MCP) penalties is capable of selecting a handful of associated SNPs from millions of potential SNPs. Unfortunately, model selection can be corrupted by false positives and false negatives, obscuring the genetic underpinning of a trait. Here, we compare LASSO and MCP penalized regression to iterative hard thresholding (IHT). On GWAS regression data, IHT is better at model selection and comparable in speed to both methods of penalized regression. This conclusion holds for both simulated and real GWAS data. IHT fosters parallelization and scales well in problems with large numbers of causal markers. Our parallel implementation of IHT accommodates SNP genotype compression and exploits multiple CPU cores and graphics processing units (GPUs). This allows statistical geneticists to leverage commodity desktop computers in GWAS analysis and to avoid supercomputing. Availability: Source code is freely available at https://github.com/klkeys/IHT.jl..
引用
收藏
页码:756 / 768
页数:13
相关论文
共 50 条
  • [1] Multivariate genome-wide association analysis by iterative hard thresholding
    Chu, Benjamin B.
    Ko, Seyoon
    Zhou, Jin J.
    Jensen, Aubrey
    Zhou, Hua
    Sinsheimer, Janet S.
    Lange, Kenneth
    [J]. BIOINFORMATICS, 2023, 39 (04)
  • [2] Iterative hard thresholding in genome-wide association studies: Generalized linear models, prior weights, and double sparsity
    Chu, Benjamin B.
    Keys, Kevin L.
    German, Christopher A.
    Zhou, Hua
    Zhou, Jin J.
    Sobel, Eric M.
    Sinsheimer, Janet S.
    Lange, Kenneth
    [J]. GIGASCIENCE, 2020, 9 (06):
  • [3] Model Selection Strategies in Genome-Wide Association Studies
    Keildson, Sarah L.
    Farrall, Martin
    Morris, Andrew P.
    [J]. GENETIC EPIDEMIOLOGY, 2009, 33 (08) : 792 - 792
  • [4] Testing and genetic model selection in genome-wide association studies
    Loley, Christina
    Koenig, Inke R.
    Hothorn, Ludwig
    Ziegler, Andreas
    [J]. ANNALS OF HUMAN GENETICS, 2012, 76 : 420 - 420
  • [5] Testing and Genetic Model Selection in Genome-Wide Association Studies
    Loley, Christina
    Konig, Inke R.
    Hothorn, Ludwig
    Ziegler, Andreas
    [J]. GENETIC EPIDEMIOLOGY, 2012, 36 (02) : 149 - 149
  • [6] An efficient unified model for genome-wide association studies and genomic selection
    Li, Hengde
    Su, Guosheng
    Jiang, Li
    Bao, Zhenmin
    [J]. GENETICS SELECTION EVOLUTION, 2017, 49
  • [7] Statistical Power of Model Selection Strategies for Genome-Wide Association Studies
    Wu, Zheyang
    Zhao, Hongyu
    [J]. PLOS GENETICS, 2009, 5 (07):
  • [8] An efficient unified model for genome-wide association studies and genomic selection
    Hengde Li
    Guosheng Su
    Li Jiang
    Zhenmin Bao
    [J]. Genetics Selection Evolution, 49
  • [9] Bayesian Variable Selection with Genome-wide Association Studies
    Bangchang, Kannat Na
    [J]. LOBACHEVSKII JOURNAL OF MATHEMATICS, 2024, 45 (02) : 613 - 620
  • [10] A variable selection method for genome-wide association studies
    He, Qianchuan
    Lin, Dan-Yu
    [J]. BIOINFORMATICS, 2011, 27 (01) : 1 - 8