Cross-trait prediction accuracy of summary statistics in genome-wide association studies

被引:1
|
作者
Zhao, Bingxin [1 ]
Zou, Fei [1 ]
Zhu, Hongtu [1 ]
机构
[1] Univ N Carolina, Dept Biostat, Chapel Hill, NC 27599 USA
关键词
BLUP; GWAS; high-dimension prediction; marginal estimator; polygenic risk score; ridge-type estimator; REGRESSION; REGULARIZATION; SELECTION; SCORES; RISK;
D O I
10.1111/biom.13661
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
In the era of big data, univariate models have widely been used as a workhorse tool for quickly producing marginal estimators; and this is true even when in a high-dimensional dense setting, in which many features are "true," but weak signals. Genome-wide association studies (GWAS) epitomize this type of setting. Although the GWAS marginal estimator is popular, it has long been criticized for ignoring the correlation structure of genetic variants (i.e., the linkage disequilibrium [LD] pattern). In this paper, we study the effects of LD pattern on the GWAS marginal estimator and investigate whether or not additionally accounting for the LD can improve the prediction accuracy of complex traits. We consider a general high-dimensional dense setting for GWAS and study a class of ridge-type estimators, including the popular marginal estimator and the best linear unbiased prediction (BLUP) estimator as two special cases. We show that the performance of GWAS marginal estimator depends on the LD pattern through the first three moments of its eigenvalue distribution. Furthermore, we uncover that the relative performance of GWAS marginal and BLUP estimators highly depends on the ratio of GWAS sample size over the number of genetic variants. Particularly, our finding reveals that the marginal estimator can easily become near-optimal within this class when the sample size is relatively small, even though it ignores the LD pattern. On the other hand, BLUP estimator has substantially better performance than the marginal estimator as the sample size increases toward the number of genetic variants, which is typically in millions. Therefore, adjusting for the LD (such as in the BLUP) is most needed when GWAS sample size is large. We illustrate the importance of our results by using the simulated data and real GWAS.
引用
收藏
页码:841 / 853
页数:13
相关论文
共 50 条
  • [31] Prospects of Fine-Mapping Trait-Associated Genomic Regions by Using Summary Statistics from Genome-wide Association Studies
    Benner, Christian
    Havulinna, Aki S.
    Jarvelin, Marjo-Riitta
    Salomaa, Veikko
    Ripatti, Samuli
    Pirinen, Matti
    AMERICAN JOURNAL OF HUMAN GENETICS, 2017, 101 (04) : 539 - 551
  • [32] Shared Genetic Architecture Between Schizophrenia and Anorexia Nervosa: A Cross-trait Genome-Wide Analysis
    Lu, Zheng-An
    Ploner, Alexander
    Birgegard, Alexander
    Bulik, Cynthia M.
    Bergen, Sarah E.
    SCHIZOPHRENIA BULLETIN, 2024, 50 (05) : 1255 - 1265
  • [33] A genome-wide cross-trait analysis characterizes the shared genetic architecture between lung and gastrointestinal diseases
    Dongfang You
    Yaqian Wu
    Mengyi Lu
    Fang Shao
    Yingdan Tang
    Sisi Liu
    Liya Liu
    Zewei Zhou
    Ruyang Zhang
    Sipeng Shen
    Theis Lange
    Hongyang Xu
    Hongxia Ma
    Yongmei Yin
    Hongbing Shen
    Feng Chen
    David C. Christiani
    Guangfu Jin
    Yang Zhao
    Nature Communications, 16 (1)
  • [34] Inferring causal relationships between phenotypes using summary statistics from genome-wide association studies
    Xiang-He Meng
    Hui Shen
    Xiang-Ding Chen
    Hong-Mei Xiao
    Hong-Wen Deng
    Human Genetics, 2018, 137 : 247 - 255
  • [35] Exploring the Genetic Correlation Between Growth and Immunity Based on Summary Statistics of Genome-Wide Association Studies
    Zhang, Zhe
    Ma, Peipei
    Li, Qiumeng
    Xiao, Qian
    Sun, Hao
    Olasege, Babatunde Shittu
    Wang, Qishan
    Pan, Yuchun
    FRONTIERS IN GENETICS, 2018, 9
  • [36] Inferring causal relationships between phenotypes using summary statistics from genome-wide association studies
    Meng, Xiang-He
    Shen, Hui
    Chen, Xiang-Ding
    Xiao, Hong-Mei
    Deng, Hong-Wen
    HUMAN GENETICS, 2018, 137 (03) : 247 - 255
  • [37] BAYESIAN LARGE-SCALE MULTIPLE REGRESSION WITH SUMMARY STATISTICS FROM GENOME-WIDE ASSOCIATION STUDIES
    Zhu, Xiang
    Stephens, Matthew
    ANNALS OF APPLIED STATISTICS, 2017, 11 (03): : 1561 - 1592
  • [38] Partitioning heritability by functional annotation using genome-wide association summary statistics
    Hilary K Finucane
    Brendan Bulik-Sullivan
    Alexander Gusev
    Gosia Trynka
    Yakir Reshef
    Po-Ru Loh
    Verneri Anttila
    Han Xu
    Chongzhi Zang
    Kyle Farh
    Stephan Ripke
    Felix R Day
    Shaun Purcell
    Eli Stahl
    Sara Lindstrom
    John R B Perry
    Yukinori Okada
    Soumya Raychaudhuri
    Mark J Daly
    Nick Patterson
    Benjamin M Neale
    Alkes L Price
    Nature Genetics, 2015, 47 : 1228 - 1235
  • [39] Investigating the shared genetic architecture of uterine leiomyoma and breast cancer: A genome-wide cross-trait analysis
    Wu, Xueyao
    Xiao, Chenghan
    Han, Zhitong
    Zhang, Li
    Zhao, Xunying
    Hao, Yu
    Xiao, Jinyu
    Gallagher, C. Scott
    Kraft, Peter
    Morton, Cynthia Casson
    Li, Jiayuan
    Jiang, Xia
    AMERICAN JOURNAL OF HUMAN GENETICS, 2022, 109 (07) : 1272 - 1285
  • [40] Partitioning heritability by functional annotation using genome-wide association summary statistics
    Finucane, Hilary K.
    Bulik-Sullivan, Brendan
    Gusev, Alexander
    Trynka, Gosia
    Reshef, Yakir
    Loh, Po-Ru
    Anttila, Verneri
    Xu, Han
    Zang, Chongzhi
    Farh, Kyle
    Ripke, Stephan
    Day, Felix R.
    Purcell, Shaun
    Stahl, Eli
    Lindstrom, Sara
    Perry, John R. B.
    Okada, Yukinori
    Raychaudhuri, Soumya
    Daly, Mark J.
    Patterson, Nick
    Neale, Benjamin M.
    Price, Alkes L.
    NATURE GENETICS, 2015, 47 (11) : 1228 - +