IGESS: a statistical approach to integrating individual-level genotype data and summary statistics in genome-wide association studies

被引:11
|
作者
Dai, Mingwei [1 ,2 ]
Ming, Jingsi [2 ]
Cai, Mingxuan [2 ]
Liu, Jin [3 ]
Yang, Can [2 ]
Wan, Xiang [4 ]
Xu, Zongben [1 ]
机构
[1] Xi An Jiao Tong Univ, Sch Math & Stat, Xian, Shaanxi, Peoples R China
[2] Hong Kong Baptist Univ, Dept Math, Hong Kong, Hong Kong, Peoples R China
[3] Duke NUS Med Sch, Ctr Quantitat Med, Singapore, Singapore
[4] Hong Kong Baptist Univ, Dept Comp Sci, Hong Kong, Hong Kong, Peoples R China
关键词
BAYESIAN VARIABLE SELECTION; HERITABILITY; REGRESSION; METAANALYSIS; TRAITS; MODELS; SET;
D O I
10.1093/bioinformatics/btx314
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Results from genome-wide association studies (GWAS) suggest that a complex phenotype is often affected by many variants with small effects, known as 'polygenicity'. Tens of thousands of samples are often required to ensure statistical power of identifying these variants with small effects. However, it is often the case that a research group can only get approval for the access to individual-level genotype data with a limited sample size (e.g. a few hundreds or thousands). Meanwhile, summary statistics generated using single-variant-based analysis are becoming publicly available. The sample sizes associated with the summary statistics datasets are usually quite large. How to make the most efficient use of existing abundant data resources largely remains an open question. Results: In this study, we propose a statistical approach, IGESS, to increasing statistical power of identifying risk variants and improving accuracy of risk prediction by integrating individual level genotype data and summary statistics. An efficient algorithm based on variational inference is developed to handle the genome-wide analysis. Through comprehensive simulation studies, we demonstrated the advantages of IGESS over the methods which take either individual-level data or summary statistics data as input. We applied IGESS to perform integrative analysis of Crohns Disease from WTCCC and summary statistics from other studies. IGESS was able to significantly increase the statistical power of identifying risk variants and improve the risk prediction accuracy from 63.2% (+/- 4%) to 69.4% (+/- 1%) using about 240 000 variants. Availability and implementation: The IGESS software is available at https://github.com/daviddaigithub/IGESS. Contact: zbxu@xjtu.edu.cn or xwan@comp.hkbu.edu.hk or eeyang@hkbu.edu.hk Supplementary information: Supplementary data are available at Bioinformatics online.
引用
收藏
页码:2882 / 2889
页数:8
相关论文
共 50 条
  • [1] Potential for Revealing Individual-Level Information in Genome-wide Association Studies
    Lumley, Thomas
    Rice, Kenneth
    [J]. JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2010, 303 (07): : 659 - 660
  • [2] LSMM: a statistical approach to integrating functional annotations with genome-wide association studies
    Ming, Jingsi
    Dai, Mingwei
    Cai, Mingxuan
    Wan, Xiang
    Liu, Jin
    Yang, Can
    [J]. BIOINFORMATICS, 2018, 34 (16) : 2788 - 2796
  • [3] A Unifying Framework for Imputing Summary Statistics in Genome-Wide Association Studies
    Wu, Yue
    Eskin, Eleazar
    Sankararaman, Sriram
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2020, 27 (03) : 418 - 428
  • [4] Adjustment for covariates using summary statistics of genome-wide association studies
    Wang, Tao
    Xue, Xiaonan
    Xie, Xianhong
    Ye, Kenny
    Zhu, Xiaofeng
    Elston, Robert C.
    [J]. GENETIC EPIDEMIOLOGY, 2018, 42 (08) : 812 - 825
  • [5] ON COMBINING INDIVIDUAL-LEVEL DATA WITH SUMMARY DATA IN STATISTICAL INFERENCES
    Deng, Lu
    Fu, Sheng
    Qin, Jing
    Yu, Kai
    [J]. STATISTICA SINICA, 2024, 34 (03) : 1505 - 1520
  • [6] A comprehensive comparison of multilocus association methods with summary statistics in genome-wide association studies
    Shao, Zhonghe
    Wang, Ting
    Qiao, Jiahao
    Zhang, Yuchen
    Huang, Shuiping
    Zeng, Ping
    [J]. BMC BIOINFORMATICS, 2022, 23 (01)
  • [7] A comprehensive comparison of multilocus association methods with summary statistics in genome-wide association studies
    Zhonghe Shao
    Ting Wang
    Jiahao Qiao
    Yuchen Zhang
    Shuiping Huang
    Ping Zeng
    [J]. BMC Bioinformatics, 23
  • [8] Multiple phenotype association tests using summary statistics in genome-wide association studies
    Liu, Zhonghua
    Lin, Xihong
    [J]. BIOMETRICS, 2018, 74 (01) : 165 - 175
  • [9] On Genetic Correlation Estimation With Summary Statistics From Genome-Wide Association Studies
    Zhao, Bingxin
    Zhu, Hongtu
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2022, 117 (537) : 1 - 11
  • [10] Comparison of three summary statistics for ranking genes in genome-wide association studies
    Freytag, Saskia
    Bickeboeller, Heike
    [J]. STATISTICS IN MEDICINE, 2014, 33 (11) : 1828 - 1841