Fast and scalable ensemble learning method for versatile polygenic risk prediction

被引:0
|
作者
Chen, Tony [1 ]
Zhang, Haoyu [2 ]
Mazumder, Rahul [3 ]
Lin, Xihong [1 ,4 ]
机构
[1] Harvard TH Chan Sch Publ Hlth, Dept Biostat, Boston, MA 02215 USA
[2] NCI, Div Canc Epidemiol & Genet, Bethesda, MD 20814 USA
[3] MIT, Sloan Sch Management, Operat Res & Stat Grp, Cambridge, MA 02139 USA
[4] Harvard Univ, Dept Stat, Cambridge, MA 02138 USA
关键词
polygenic risk scores; ensemble learning; L0Learn; penalized regression; LINKAGE DISEQUILIBRIUM; SELECTION; REGRESSION; ACCURACY; DISEASE; MODELS; REGULARIZATION; ASSOCIATION; INSIGHTS; SCORE;
D O I
10.1073/pnas.2403210121
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Polygenic risk scores (PRS) enhance population risk stratification and advance personalized medicine, but existing methods face several limitations, encompassing issues related to computational burden, predictive accuracy, and adaptability to a wide range of genetic architectures. To address these issues, we propose Aggregated L0Learn using Summary- level data (ALL- Sum), a fast and scalable ensemble learning method for computing PRS using summary statistics from genome-wide association studies (GWAS). ALL- Sum leverages a L0L2 penalized regression and ensemble learning across tuning parameters to flexibly model traits with diverse genetic architectures. In extensive large- scale simulations across a wide range of polygenicity and GWAS sample sizes, ALL- Sum consistently outperformed popular alternative methods in terms of prediction accuracy, runtime, and memory usage by 10%, 20- fold, and threefold, respectively, and demonstrated robustness to diverse genetic architectures. We validated the performance of ALL- Sum in real data analysis of 11 complex traits using GWAS summary statistics from nine data sources, including the Global Lipids Genetics Consortium, Breast Cancer Association Consortium, and FinnGen Biobank, with validation in the UK Biobank. Our results show that on average, ALL- Sum obtained PRS with 25% higher accuracy on average, with 15 times faster computation and half the memory than the current state- of- the- art methods, and had robust performance across a wide range of traits and diseases. Furthermore, our method demonstrates stable prediction when using linkage disequilibrium computed from different data sources. ALL- Sum is available as a user- friendly R software package with publicly available reference data for streamlined analysis.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] An ensemble penalized regression method for multi-ancestry polygenic risk prediction
    Zhang, Jingning
    Zhan, Jianan
    Jin, Jin
    Ma, Cheng
    Zhao, Ruzhang
    O'Connell, Jared
    Jiang, Yunxuan
    Koelsch, Bertram L.
    Zhang, Haoyu
    Chatterjee, Nilanjan
    NATURE COMMUNICATIONS, 2024, 15 (01)
  • [2] Learning high-order interactions for polygenic risk prediction
    Massi, Michela C.
    Franco, Nicola R.
    Manzoni, Andrea
    Paganoni, Anna Maria
    Park, Hanla A.
    Hoffmeister, Michael
    Brenner, Hermann
    Chang-Claude, Jenny
    Ieva, Francesca
    Zunino, Paolo
    PLOS ONE, 2023, 18 (02):
  • [3] Polygenic risk scores and machine learning improve glaucoma prediction
    Gao, Xiaoyi Raymond
    Lin, Yizi
    Chiariglione, Marion
    INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE, 2021, 62 (08)
  • [4] A fast prediction method of fatigue life for crane structure based on Stacking ensemble learning model
    Zhao, Jincheng
    Dong, Qing
    Xu, Gening
    Li, Hongjuan
    Lu, Haiting
    Zhuang, Weishan
    Journal of Engineering and Applied Science, 2024, 71 (01):
  • [5] A Scalable Machine Learning-based Ensemble Approach to Enhance the Prediction Accuracy for Identifying Students at-Risk
    Verma, Swati
    Yadav, Rakesh Kumar
    Kholiya, Kuldeep
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (08) : 185 - 192
  • [6] Polygenic Risk Prediction in Diverticulitis
    De Roo, Ana C.
    Chen, Yanhua
    Du, Xiaomeng
    Handelman, Samuel
    Byrnes, Mary
    Regenbogen, Scott E.
    Speliotes, Elizabeth K.
    Maguire, Lillias H.
    ANNALS OF SURGERY, 2023, 277 (06) : E1262 - E1268
  • [7] Ensemble Learning Models for Food Safety Risk Prediction
    Wu, Li-Ya
    Weng, Sung-Shun
    SUSTAINABILITY, 2021, 13 (21)
  • [8] Multiethnic Polygenic Risk Prediction in Diverse Populations through Transfer Learning
    Tian, Peixin
    Chan, Tsai H.
    Wang, Yong-Fei
    Yang, Wanling
    Yin, Guosheng
    Zhang, Yan D.
    GENETIC EPIDEMIOLOGY, 2022, 46 (07) : 538 - 538
  • [9] Transfer learning with false negative control improves polygenic risk prediction
    Jeng, Xinge Jessie
    Hu, Yifei
    Venkat, Vaishnavi
    Lu, Tzu-Pin
    Tzeng, Jung-Ying
    PLOS GENETICS, 2023, 19 (11):
  • [10] Multiethnic polygenic risk prediction in diverse populations through transfer learning
    Tian, Peixin
    Chan, Tsai Hor
    Wang, Yong-Fei
    Yang, Wanling
    Yin, Guosheng
    Zhang, Yan Dora
    FRONTIERS IN GENETICS, 2022, 13