EraSOR: a software tool to eliminate inflation caused by sample overlap in polygenic score analyses

被引:10
|
作者
Choi, Shing Wan [1 ]
Mak, Timothy Shin Heng [2 ]
Hoggart, Clive J.
O'Reilly, Paul F. [1 ]
机构
[1] Kings Coll London, MRC Social Genet & Dev Psychiat Ctr, Inst Psychiat Psychol & Neurosci, London SE5 8AF, England
[2] Univ Hong Kong, Ctr Genom Sci, Pokfulam, Hong Kong, Peoples R China
来源
GIGASCIENCE | 2023年 / 12卷
基金
英国医学研究理事会; 美国国家卫生研究院;
关键词
REGRESSION; PROJECT; BIOBANK; RISK;
D O I
10.1093/gigascience/giad043
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: Polygenic risk score (PRS) analyses are now routinely applied across biomedical research. However, as PRS studies grow in size, there is an increased risk of sample overlap between the genome-wide association study (GWAS) from which the PRS is derived and the "target sample," in which PRSs are computed and hypotheses are tested. Despite the wide recognition of the sample overlap problem, its potential impact on the results from PRS studies has not yet been quantified, and no analytical solution has been provided. Findings: Here, we first conduct a comprehensive investigation into the scale of the sample overlap problem, finding that PRS results can be substantially inflated even in the presence of minimal overlap. Next, we introduce a method and software, EraSOR (Erase Sample Overlap and Relatedness), which eliminates the inflation caused by sample overlap (and close relatedness) in almost all settings tested here. Conclusions: EraSOR could be useful in PRS studies (with target sample >1,000) similar to those investigated here, either (i) to mitigate the potential effects of known or unknown intercohort overlap and close relatedness or (ii) as a sensitivity tool to highlight the possible presence of sample overlap before its direct removal, when possible, or else to provide a lower bound on PRS analysis results after accounting for potential sample overlap.
引用
收藏
页数:11
相关论文
共 6 条
  • [1] Inflation of polygenic risk scores caused by sample overlap and relatedness: Examples of a major risk of bias
    Ellis, Colin A.
    Oliver, Karen L.
    Harris, Rebekah, V
    Ottman, Ruth
    Scheffer, Ingrid E.
    Mefford, Heather C.
    Epstein, Michael P.
    Berkovic, Samuel F.
    Bahlo, Melanie
    AMERICAN JOURNAL OF HUMAN GENETICS, 2024, 111 (09) : 1805 - 1809
  • [2] PRSet: Pathway-based polygenic risk score analyses and software
    Choi, Shing Wan
    Garcia-Gonzalez, Judit
    Ruan, Yunfeng
    Wu, Hei Man
    Porras, Christian
    Johnson, Jessica
    Hoggart, Clive
    O'Reilly, Paul
    PLOS GENETICS, 2023, 19 (02):
  • [3] Exploring the impact of overlap and first degree relatives on over fitting in polygenic risk score analyses
    Medland, Sarah
    BEHAVIOR GENETICS, 2019, 49 (06) : 552 - 553
  • [4] PRSice 2: POLYGENIC RISK SCORE SOFTWARE (UPDATED) AND ITS APPLICATION TO CROSS-TRAIT ANALYSES
    Choi, Shing Wan
    O'Reilly, Paul
    EUROPEAN NEUROPSYCHOPHARMACOLOGY, 2019, 29 : S832 - S832
  • [5] Examination of Genetic Overlap in Vulnerability to Posttraumatic Chronic Pain, Stress, and Depression Symptoms Following Motor Vehicle Collision Using Polygenic Risk Score Analyses
    Lobo, Jarred
    Tungate, Andrew
    Peak, David A.
    Swor, Robert A.
    Rathlev, Niels K.
    Hendry, Phyllis
    McLean, Samuel A.
    Linnstaedt, Sarah
    BIOLOGICAL PSYCHIATRY, 2020, 87 (09) : S285 - S285
  • [6] Software application profile: mrrobust-a tool for performing two-sample summary Mendelian randomization analyses
    Spiller, Wes
    Davies, Neil M.
    Palmer, Tom M.
    INTERNATIONAL JOURNAL OF EPIDEMIOLOGY, 2019, 48 (03) : 684 - 690