Cross-trait prediction accuracy of summary statistics in genome-wide association studies

被引:1
|
作者
Zhao, Bingxin [1 ]
Zou, Fei [1 ]
Zhu, Hongtu [1 ]
机构
[1] Univ N Carolina, Dept Biostat, Chapel Hill, NC 27599 USA
关键词
BLUP; GWAS; high-dimension prediction; marginal estimator; polygenic risk score; ridge-type estimator; REGRESSION; REGULARIZATION; SELECTION; SCORES; RISK;
D O I
10.1111/biom.13661
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
In the era of big data, univariate models have widely been used as a workhorse tool for quickly producing marginal estimators; and this is true even when in a high-dimensional dense setting, in which many features are "true," but weak signals. Genome-wide association studies (GWAS) epitomize this type of setting. Although the GWAS marginal estimator is popular, it has long been criticized for ignoring the correlation structure of genetic variants (i.e., the linkage disequilibrium [LD] pattern). In this paper, we study the effects of LD pattern on the GWAS marginal estimator and investigate whether or not additionally accounting for the LD can improve the prediction accuracy of complex traits. We consider a general high-dimensional dense setting for GWAS and study a class of ridge-type estimators, including the popular marginal estimator and the best linear unbiased prediction (BLUP) estimator as two special cases. We show that the performance of GWAS marginal estimator depends on the LD pattern through the first three moments of its eigenvalue distribution. Furthermore, we uncover that the relative performance of GWAS marginal and BLUP estimators highly depends on the ratio of GWAS sample size over the number of genetic variants. Particularly, our finding reveals that the marginal estimator can easily become near-optimal within this class when the sample size is relatively small, even though it ignores the LD pattern. On the other hand, BLUP estimator has substantially better performance than the marginal estimator as the sample size increases toward the number of genetic variants, which is typically in millions. Therefore, adjusting for the LD (such as in the BLUP) is most needed when GWAS sample size is large. We illustrate the importance of our results by using the simulated data and real GWAS.
引用
收藏
页码:841 / 853
页数:13
相关论文
共 50 条
  • [1] A Unifying Framework for Imputing Summary Statistics in Genome-Wide Association Studies
    Wu, Yue
    Eskin, Eleazar
    Sankararaman, Sriram
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2020, 27 (03) : 418 - 428
  • [2] Adjustment for covariates using summary statistics of genome-wide association studies
    Wang, Tao
    Xue, Xiaonan
    Xie, Xianhong
    Ye, Kenny
    Zhu, Xiaofeng
    Elston, Robert C.
    GENETIC EPIDEMIOLOGY, 2018, 42 (08) : 812 - 825
  • [3] Multi-trait analysis of genome-wide association summary statistics using MTAG
    Turley, Patrick
    Walters, Raymond K.
    Maghzian, Omeed
    Okbay, Aysu
    Lee, James J.
    Fontana, Mark Alan
    Tuan Anh Nguyen-Viet
    Wedow, Robbee
    Zacher, Meghan
    Furlotte, Nicholas A.
    Magnusson, Patrik
    Oskarsson, Sven
    Johannesson, Magnus
    Visscher, Peter M.
    Laibson, David
    Cesarini, David
    Neale, Benjamin M.
    Benjamin, Daniel J.
    NATURE GENETICS, 2018, 50 (02) : 229 - +
  • [4] Multi-trait analysis of genome-wide association summary statistics using MTAG
    Patrick Turley
    Raymond K. Walters
    Omeed Maghzian
    Aysu Okbay
    James J. Lee
    Mark Alan Fontana
    Tuan Anh Nguyen-Viet
    Robbee Wedow
    Meghan Zacher
    Nicholas A. Furlotte
    Patrik Magnusson
    Sven Oskarsson
    Magnus Johannesson
    Peter M. Visscher
    David Laibson
    David Cesarini
    Benjamin M. Neale
    Daniel J. Benjamin
    Nature Genetics, 2018, 50 : 229 - 237
  • [5] CAUSALdb: a database for disease/trait causal variants identified using summary statistics of genome-wide association studies
    Wang, Jianhua
    Huang, Dandan
    Zhou, Yao
    Yao, Hongcheng
    Liu, Huanhuan
    Zhai, Sinan
    Wu, Chengwei
    Zheng, Zhanye
    Zhao, Ke
    Wang, Zhao
    Yi, Xianfu
    Zhang, Shijie
    Liu, Xiaorong
    Liu, Zipeng
    Chen, Kexin
    Yu, Ying
    Sham, Pak Chung
    Li, Mulin Jun
    NUCLEIC ACIDS RESEARCH, 2020, 48 (D1) : D807 - D816
  • [6] An adaptive and robust method for multi-trait analysis of genome-wide association studies using summary statistics
    Deng, Qiaolan
    Song, Chi
    Lin, Shili
    EUROPEAN JOURNAL OF HUMAN GENETICS, 2024, 32 (06) : 681 - 690
  • [7] Cross-trait genome-wide association analysis of C-reactive protein level and psychiatric disorders
    Hindley, Guy
    Drange, Ole Kristian
    Lin, Aihua
    Kutrolli, Gleda
    Shadrin, Alexey A.
    Parker, Nadine
    O'Connell, Kevin S.
    Rodevand, Linn
    Cheng, Weiqiu
    Bahrami, Shahram
    Karadag, Naz
    Holen, Borge
    Jaholkowski, Piotr
    Woldeyohannes, Markos Tesfaye
    Djurovic, Srdjan
    Dale, Anders M.
    Frei, Oleksandr
    Ueland, Thor
    Smeland, Olav B.
    Andreassen, Ole A.
    PSYCHONEUROENDOCRINOLOGY, 2023, 157
  • [8] A comprehensive comparison of multilocus association methods with summary statistics in genome-wide association studies
    Shao, Zhonghe
    Wang, Ting
    Qiao, Jiahao
    Zhang, Yuchen
    Huang, Shuiping
    Zeng, Ping
    BMC BIOINFORMATICS, 2022, 23 (01)
  • [9] A comprehensive comparison of multilocus association methods with summary statistics in genome-wide association studies
    Zhonghe Shao
    Ting Wang
    Jiahao Qiao
    Yuchen Zhang
    Shuiping Huang
    Ping Zeng
    BMC Bioinformatics, 23
  • [10] Multiple phenotype association tests using summary statistics in genome-wide association studies
    Liu, Zhonghua
    Lin, Xihong
    BIOMETRICS, 2018, 74 (01) : 165 - 175