A new method for multiancestry polygenic prediction improves performance across diverse populations

被引:27
|
作者
Zhang H. [1 ,2 ]
Zhan J. [3 ]
Jin J. [4 ,5 ]
Zhang J. [4 ]
Lu W. [6 ]
Zhao R. [4 ]
Ahearn T.U. [1 ]
Yu Z. [7 ]
O’Connell J. [3 ]
Jiang Y. [3 ]
Chen T. [2 ]
Okuhara D. [8 ]
Aslibekyan S. [3 ]
Auton A. [3 ]
Babalola E. [3 ]
Bell R.K. [3 ]
Bielenberg J. [3 ]
Bryc K. [3 ]
Bullis E. [3 ]
Coker D. [3 ]
Partida G.C. [3 ]
Dhamija D. [3 ]
Das S. [3 ]
Elson S.L. [3 ]
Eriksson N. [3 ]
Filshtein T. [3 ]
Fitch A. [3 ]
Fletez-Brant K. [3 ]
Fontanillas P. [3 ]
Freyman W. [3 ]
Granka J.M. [3 ]
Heilbron K. [3 ]
Hernandez A. [3 ]
Hicks B. [3 ]
Hinds D.A. [3 ]
Jewett E.M. [3 ]
Kukar K. [3 ]
Kwong A. [3 ]
Lin K.-H. [3 ]
Llamas B.A. [3 ]
Lowe M. [3 ]
McCreight J.C. [3 ]
McIntyre M.H. [3 ]
Micheletti S.J. [3 ]
Moreno M.E. [3 ]
Nandakumar P. [3 ]
Nguyen D.T. [3 ]
Noblin E.S. [3 ]
Petrakovitz A.A. [3 ]
Poznik G.D. [3 ]
机构
[1] Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD
[2] Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA
[3] 23andMe, Inc., Sunnyvale, CA
[4] Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD
[5] Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA
[6] Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, MD
[7] Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
[8] Booz Allen Hamilton Inc., McLean, VA
[9] Division of Genetics and Epidemiology, Institute of Cancer Research, London
[10] Department of Statistics, Harvard University, Cambridge, MA
[11] Department of Oncology, School of Medicine, Johns Hopkins University, Baltimore, MD
基金
美国国家卫生研究院;
关键词
D O I
10.1038/s41588-023-01501-z
中图分类号
学科分类号
摘要
Polygenic risk scores (PRSs) increasingly predict complex traits; however, suboptimal performance in non-European populations raise concerns about clinical applications and health inequities. We developed CT-SLEB, a powerful and scalable method to calculate PRSs, using ancestry-specific genome-wide association study summary statistics from multiancestry training samples, integrating clumping and thresholding, empirical Bayes and superlearning. We evaluated CT-SLEB and nine alternative methods with large-scale simulated genome-wide association studies (~19 million common variants) and datasets from 23andMe, Inc., the Global Lipids Genetics Consortium, All of Us and UK Biobank, involving 5.1 million individuals of diverse ancestry, with 1.18 million individuals from four non-European populations across 13 complex traits. Results demonstrated that CT-SLEB significantly improves PRS performance in non-European populations compared with simple alternatives, with comparable or superior performance to a recent, computationally intensive method. Moreover, our simulation studies offered insights into sample size requirements and SNP density effects on multiancestry risk prediction. © 2023, The Author(s), under exclusive licence to Springer Nature America, Inc.
引用
收藏
页码:1757 / 1768
页数:11
相关论文
共 50 条
  • [31] Performance of Oncotype DX DCIS score across diverse ethnic populations
    Mittar, P.
    Casella, S.
    Bombonati, A.
    Emiloju, O.
    Jablon, L.
    Schultz, D.
    Leighton, J. C.
    Solin, L. J.
    CANCER RESEARCH, 2019, 79 (04)
  • [32] Polygenic Selection and Environmental Influence on Adult Body Height: Genetic and Living Standard Contributions Across Diverse Populations
    Piffer, Davide
    Kirkegaard, Emil O. W.
    TWIN RESEARCH AND HUMAN GENETICS, 2024, 27 (06) : 265 - 282
  • [33] Variability in performance of genetic-enhanced DXA-BMD prediction models across diverse ethnic and geographic populations: A risk prediction study
    Liu, Yong
    Meng, Xiang-He
    Wu, Chong
    Su, Kuan-Jui
    Liu, Anqi
    Tian, Qing
    Zhao, Lan-Juan
    Qiu, Chuan
    Luo, Zhe
    Gonzalez-Ramirez, Martha, I
    Shen, Hui
    Xiao, Hong-Mei
    Deng, Hong-Wen
    PLOS MEDICINE, 2024, 21 (08)
  • [34] Consequences of population genetic differences in genetic risk prediction across diverse human populations
    Martin, A. R.
    Kanai, M.
    Kamatani, Y.
    Okada, Y.
    Neale, B. M.
    Daly, M. J.
    EUROPEAN JOURNAL OF HUMAN GENETICS, 2019, 27 : 1066 - 1067
  • [35] Transcriptome prediction performance across machine learning models and diverse ancestries
    Okoro, Paul C.
    Schubert, Ryan
    Guo, Xiuqing
    Johnson, W. Craig
    Rotter, Jerome, I
    Hoeschele, Ina
    Liu, Yongmei
    Im, Hae Kyung
    Luke, Amy
    Dugas, Lara R.
    Wheeler, Heather E.
    HUMAN GENETICS AND GENOMICS ADVANCES, 2021, 2 (02):
  • [36] Non-linear machine learning models incorporating SNPs and PRS improve polygenic prediction in diverse human populations
    Elgart, Michael
    Lyons, Genevieve
    Romero-Brufau, Santiago
    Kurniansyah, Nuzulul
    Brody, Jennifer A.
    Guo, Xiuqing
    Lin, Henry J.
    Raffield, Laura
    Gao, Yan
    Chen, Han
    de Vries, Paul
    Lloyd-Jones, Donald M.
    Lange, Leslie A.
    Peloso, Gina M.
    Fornage, Myriam
    Rotter, Jerome, I
    Rich, Stephen S.
    Morrison, Alanna C.
    Psaty, Bruce M.
    Levy, Daniel
    Redline, Susan
    Sofer, Tamar
    COMMUNICATIONS BIOLOGY, 2022, 5 (01)
  • [37] Non-linear machine learning models incorporating SNPs and PRS improve polygenic prediction in diverse human populations
    Michael Elgart
    Genevieve Lyons
    Santiago Romero-Brufau
    Nuzulul Kurniansyah
    Jennifer A. Brody
    Xiuqing Guo
    Henry J. Lin
    Laura Raffield
    Yan Gao
    Han Chen
    Paul de Vries
    Donald M. Lloyd-Jones
    Leslie A. Lange
    Gina M. Peloso
    Myriam Fornage
    Jerome I. Rotter
    Stephen S. Rich
    Alanna C. Morrison
    Bruce M. Psaty
    Daniel Levy
    Susan Redline
    Tamar Sofer
    Communications Biology, 5
  • [38] Multiancestry Genome-Wide Association Study Reveals Distinct Biological Pathways and Cell Types Driving Type 2 Diabetes Risk with Heterogeneous Effects across Diverse Populations
    Suzuki, Ken
    Hatzikotoulas, Konstantinos
    Southam, Lorraine
    Yin, Xianyong
    Lorenz, Kimberly
    Mandla, Ravi
    Taylor, Henry J.
    Huerta, Alicia
    Rayner, Nigel W.
    Meigs, James B.
    Mccarthy, Mark I.
    Mahajan, Anubha
    Mercader, Josep M.
    Spracklen, Cassandra
    Boehnke, Michael
    Vujkovic, Marijana
    Rotter, Jerome I.
    Voight, Benjamin F.
    Zeggini, Eleftheria
    Morris, Andrew
    DIABETES, 2023, 72
  • [39] Quantifying factors that affect polygenic risk score performance across diverse ancestries and age groups for body mass index
    Hui, Daniel
    Xiao, Brenda
    Dikilitas, Ozan
    Freimuth, Robert R.
    Irvin, Marguerite R.
    Jarvik, Gail P.
    Kottyan, Leah
    Kullo, Iftikhar
    Limdi, Nita A.
    Liu, Cong
    Luo, Yuan
    Namjou, Bahram
    Puckelwartz, Megan J.
    Schaid, Daniel
    Tiwari, Hemant
    Wei, Wei-Qi
    Verma, Shefali
    Kim, Dokyoon
    Ritchie, Marylyn D.
    BIOCOMPUTING 2023, PSB 2023, 2023, : 437 - 448
  • [40] A new ensemble learning method stratified sampling blending optimizes conventional blending and improves prediction performance
    Miao, Na
    Yang, Mengke
    Han, Pingping
    Qiao, Jiakun
    Che, Zhaoxuan
    Xu, Fangjun
    Dai, Xiangyu
    Zhu, Mengjin
    BIOINFORMATICS ADVANCES, 2025, 5 (01):