Inflated expectations: Rare-variant association analysis using public controls

被引:3
|
作者
Kim, Jung [1 ]
Karyadi, Danielle M. [1 ]
Hartley, Stephen W. [1 ]
Zhu, Bin [1 ]
Wang, Mingyi [2 ,3 ]
Wu, Dongjing [2 ,3 ]
Song, Lei [1 ]
Armstrong, Gregory T. [4 ]
Bhatia, Smita [5 ]
Robison, Leslie L. [4 ]
Yasui, Yutaka [4 ]
Carter, Brian [6 ]
Sampson, Joshua N. [1 ]
Freedman, Neal D. [1 ]
Goldstein, Alisa M. [1 ]
Mirabello, Lisa [1 ]
Chanock, Stephen J. [1 ]
Morton, Lindsay M. [1 ]
Savage, Sharon A. [1 ]
Stewart, Douglas R. [1 ]
机构
[1] NCI, Div Canc Epidemiol & Genet, Rockville, MD 20850 USA
[2] NCI, Div Canc Epidemiol & Genet, Canc Genom Res Lab, Rockville, MD USA
[3] Frederick Natl Lab Canc Res, Leidos Biomed Res Inc, Frederick, MD USA
[4] St Jude Childrens Res Hosp, Dept Epidemiol & Canc Control, Memphis, TN USA
[5] Univ Alabama Birmingham, Inst Canc Outcomes & Survivorship, Birmingham, AL USA
[6] Amer Canc Soc, Dept Populat Sci, Atlanta, GA USA
来源
PLOS ONE | 2023年 / 18卷 / 01期
关键词
DESIGN;
D O I
10.1371/journal.pone.0280951
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The use of publicly available sequencing datasets as controls (hereafter, "public controls") in studies of rare variant disease associations has great promise but can increase the risk of false-positive discovery. The specific factors that could contribute to inflated distribution of test statistics have not been systematically examined. Here, we leveraged both public controls, gnomAD v2.1 and several datasets sequenced in our laboratory to systematically investigate factors that could contribute to the false-positive discovery, as measured by lambda(Delta 95), a measure to quantify the degree of inflation in statistical significance. Analyses of datasets in this investigation found that 1) the significantly inflated distribution of test statistics decreased substantially when the same variant caller and filtering pipelines were employed, 2) differences in library prep kits and sequencers did not affect the false-positive discovery rate and, 3) joint vs. separate variant-calling of cases and controls did not contribute to the inflation of test statistics. Currently available methods do not adequately adjust for the high false-positive discovery. These results, especially if replicated, emphasize the risks of using public controls for rare-variant association tests in which individual-level data and the computational pipeline are not readily accessible, which prevents the use of the same variant-calling and filtering pipelines on both cases and controls. A plausible solution exists with the emergence of cloud-based computing, which can make it possible to bring containerized analytical pipelines to the data (rather than the data to the pipeline) and could avert or minimize these issues. It is suggested that future reports account for this issue and provide this as a limitation in reporting new findings based on studies that cannot practically analyze all data on a single pipeline.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Rare-variant association methods
    Orli Bahcall
    Nature Genetics, 2012, 44 (11) : 1178 - 1178
  • [2] Rare-variant association study
    Wang, Yin
    Chan, Ying Wai
    CELL GENOMICS, 2024, 4 (05):
  • [3] Powerful Rare-Variant Association Analysis of Secondary Phenotypes
    Liu, Hanyun
    Zhang, Hong
    GENETIC EPIDEMIOLOGY, 2025, 49 (01)
  • [4] Utilizing Population Controls in Rare-Variant Case-Parent Association Tests
    Jiang, Yu
    Satten, Glen A.
    Han, Yujun
    Epstein, Michael P.
    Heinzen, Erin L.
    Goldstein, David B.
    Allen, Andrew S.
    AMERICAN JOURNAL OF HUMAN GENETICS, 2014, 94 (06) : 845 - 853
  • [5] Rare-Variant Association Analysis: Study Designs and Statistical Tests
    Lee, Seunggeung
    Abecasis, Goncalo R.
    Boehnke, Michael
    Lin, Xihong
    AMERICAN JOURNAL OF HUMAN GENETICS, 2014, 95 (01) : 5 - 23
  • [6] Multi-trait analysis of rare-variant association summary statistics using MTAR
    Lan Luo
    Judong Shen
    Hong Zhang
    Aparna Chhibber
    Devan V. Mehrotra
    Zheng-Zheng Tang
    Nature Communications, 11
  • [7] Multi-trait analysis of rare-variant association summary statistics using MTAR
    Luo, Lan
    Shen, Judong
    Zhang, Hong
    Chhibber, Aparna
    Mehrotra, Devan, V
    Tang, Zheng-Zheng
    NATURE COMMUNICATIONS, 2020, 11 (01)
  • [8] The exhaustive genomic scan approach, with an application to rare-variant association analysis
    Kanoungi, George
    Nothnagel, Michael
    Becker, Tim
    Drichel, Dmitriy
    EUROPEAN JOURNAL OF HUMAN GENETICS, 2020, 28 (09) : 1283 - 1291
  • [9] The exhaustive genomic scan approach, with an application to rare-variant association analysis
    George Kanoungi
    Michael Nothnagel
    Tim Becker
    Dmitriy Drichel
    European Journal of Human Genetics, 2020, 28 : 1283 - 1291
  • [10] Improving power for rare-variant tests by integrating external controls
    Lee, Seunggeun
    Kim, Sehee
    Fuchsberger, Christian
    GENETIC EPIDEMIOLOGY, 2017, 41 (07) : 610 - 619