Bayesian hierarchical hypothesis testing in large-scale genome-wide association analysis

被引:0
|
作者
Samaddar, Anirban [1 ]
Maiti, Tapabrata [1 ]
de los Campos, Gustavo [1 ,2 ,3 ]
机构
[1] Michigan State Univ, Dept Stat & Probabil, E Lansing, MI 48824 USA
[2] Michigan State Univ, Dept Epidemiol & Biostat, E Lansing, MI 48824 USA
[3] Michigan State Univ, Inst Quantitat Hlth Sci & Engn, E Lansing, MI 48824 USA
关键词
Bayesian variable selection; Bayesian hierarchical hypothesis testing; false discovery rate; GWAS; collinearity; multiresolution inference; spike and slab prior; linkage disequilibrium; UK-Biobank data; FALSE DISCOVERY RATE; VARIABLE-SELECTION; REGRESSION; HERITABILITY; PREDICTION;
D O I
10.1093/genetics/iyae164
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Variable selection and large-scale hypothesis testing are techniques commonly used to analyze high-dimensional genomic data. Despite recent advances in theory and methodology, variable selection and inference with highly collinear features remain challenging. For instance, collinearity poses a great challenge in genome-wide association studies involving millions of variants, many of which may be in high linkage disequilibrium. In such settings, collinearity can significantly reduce the power of variable selection methods to identify individual variants associated with an outcome. To address such challenges, we developed a Bayesian hierarchical hypothesis testing (BHHT)-a novel multiresolution testing procedure that offers high power with adequate error control and fine-mapping resolution. We demonstrate through simulations that the proposed methodology has a power-FDR performance that is competitive with (and in many scenarios better than) state-of-the-art methods. Finally, we demonstrate the feasibility of using BHHT with large sample size ( n similar to 300,000) and ultra dimensional genotypes (similar to 15 million single-nucleotide polymorphisms or SNPs) by applying it to eight complex traits using data from the UK-Biobank. Our results show that the proposed methodology leads to many more discoveries than those obtained using traditional SNP-centered inference procedures. The article is accompanied by open-source software that implements the methods described in this study using algorithms that scale to biobank-size ultra-high-dimensional data.
引用
收藏
页数:12
相关论文
共 50 条
  • [41] Large-scale multitrait genome-wide association analyses identify hundreds of glaucoma risk loci
    Xikun Han
    Puya Gharahkhani
    Andrew R. Hamel
    Jue Sheng Ong
    Miguel E. Rentería
    Puja Mehta
    Xianjun Dong
    Francesca Pasutto
    Christopher Hammond
    Terri L. Young
    Pirro Hysi
    Andrew J. Lotery
    Eric Jorgenson
    Hélène Choquet
    Michael Hauser
    Jessica N. Cooke Bailey
    Toru Nakazawa
    Masato Akiyama
    Yukihiro Shiga
    Zachary L. Fuller
    Xin Wang
    Alex W. Hewitt
    Jamie E. Craig
    Louis R. Pasquale
    David A. Mackey
    Janey L. Wiggs
    Anthony P. Khawaja
    Ayellet V. Segrè
    Stuart MacGregor
    Nature Genetics, 2023, 55 : 1116 - 1125
  • [42] Approximate generalized least squares method for large-scale genome-wide association study.
    Ma, L.
    Jiang, J.
    Prakapenka, D.
    Cole, J.
    Da, Y.
    JOURNAL OF DAIRY SCIENCE, 2019, 102 : 30 - 30
  • [43] Genome-wide Analysis of Large-scale Longitudinal Outcomes using Penalization - GALLOP algorithm
    Sikorska, Karolina
    Lesaffre, Emmanuel
    Groenen, Patrick J. F.
    Rivadeneira, Fernando
    Eilers, Paul H. C.
    SCIENTIFIC REPORTS, 2018, 8
  • [44] Genome-wide Analysis of Large-scale Longitudinal Outcomes using Penalization —GALLOP algorithm
    Karolina Sikorska
    Emmanuel Lesaffre
    Patrick J. F. Groenen
    Fernando Rivadeneira
    Paul H. C. Eilers
    Scientific Reports, 8
  • [45] The Bayesian lasso for genome-wide association studies
    Li, Jiahan
    Das, Kiranmoy
    Fu, Guifang
    Li, Runze
    Wu, Rongling
    BIOINFORMATICS, 2011, 27 (04) : 516 - 523
  • [46] Hierarchical Modelling for Genome-Wide Association Data
    Heron, Eleisa
    O'Dushlaine, Colm
    ANNALS OF HUMAN GENETICS, 2009, 73 : 665 - 665
  • [47] A large-scale genome-wide association analysis reveals QTL and candidate genes for intramuscular fat content in Duroc pigs
    Zhuang, Z.
    Ding, R.
    Qiu, Y.
    Wu, J.
    Zhou, S.
    Quan, J.
    Zheng, E.
    Li, Z.
    Wu, Z.
    Yang, J.
    ANIMAL GENETICS, 2021, 52 (04) : 518 - 522
  • [48] Large-Scale Integrated Genome-Wide RNA Sequencing, miRNA Array, and Genomic Analyses to Unravel the Functionality of Genome-Wide Association Results in Endometriosis.
    Rahmioglu, Nilufer
    Lockstone, Helen
    Ferreira, Teresa
    Magi, Reedik
    Van De Bunt, Martijn
    Lindgren, Cecilia
    Morris, Andrew
    Becker, Christian
    Zondervan, Krina
    REPRODUCTIVE SCIENCES, 2017, 24 : 205A - 206A
  • [49] Learning hierarchical Bayesian networks for large-scale data analysis
    Hwang, Kyu-Baek
    Kim, Byoung-Hee
    Zhang, Byoung-Tak
    NEURAL INFORMATION PROCESSING, PT 1, PROCEEDINGS, 2006, 4232 : 670 - 679
  • [50] Genome-wide, large-scale production of mutant mice by ENU mutagenesis
    Martin Hrabé de Angelis
    Heinrich Flaswinkel
    Helmut Fuchs
    Birgit Rathkolb
    Dian Soewarto
    Susan Marschall
    Stephan Heffner
    Walter Pargent
    Kurt Wuensch
    Martin Jung
    André Reis
    Thomas Richter
    Francesca Alessandrini
    Thilo Jakob
    Edith Fuchs
    Helmut Kolb
    Elisabeth Kremmer
    Karlheinz Schaeble
    Boris Rollinski
    Adelbert Roscher
    Christoph Peters
    Thomas Meitinger
    Tim Strom
    Thomas Steckler
    Florian Holsboer
    Thomas Klopstock
    Florian Gekeler
    Catherine Schindewolf
    Thomas Jung
    Karen Avraham
    Heidrun Behrendt
    Johannes Ring
    Andreas Zimmer
    Klaus Schughart
    Klaus Pfeffer
    Eckhard Wolf
    Rudi Balling
    Nature Genetics, 2000, 25 : 444 - 447