Fast and scalable ensemble learning method for versatile polygenic risk prediction

被引:0
|
作者
Chen, Tony [1 ]
Zhang, Haoyu [2 ]
Mazumder, Rahul [3 ]
Lin, Xihong [1 ,4 ]
机构
[1] Harvard TH Chan Sch Publ Hlth, Dept Biostat, Boston, MA 02215 USA
[2] NCI, Div Canc Epidemiol & Genet, Bethesda, MD 20814 USA
[3] MIT, Sloan Sch Management, Operat Res & Stat Grp, Cambridge, MA 02139 USA
[4] Harvard Univ, Dept Stat, Cambridge, MA 02138 USA
关键词
polygenic risk scores; ensemble learning; L0Learn; penalized regression; LINKAGE DISEQUILIBRIUM; SELECTION; REGRESSION; ACCURACY; DISEASE; MODELS; REGULARIZATION; ASSOCIATION; INSIGHTS; SCORE;
D O I
10.1073/pnas.2403210121
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Polygenic risk scores (PRS) enhance population risk stratification and advance personalized medicine, but existing methods face several limitations, encompassing issues related to computational burden, predictive accuracy, and adaptability to a wide range of genetic architectures. To address these issues, we propose Aggregated L0Learn using Summary- level data (ALL- Sum), a fast and scalable ensemble learning method for computing PRS using summary statistics from genome-wide association studies (GWAS). ALL- Sum leverages a L0L2 penalized regression and ensemble learning across tuning parameters to flexibly model traits with diverse genetic architectures. In extensive large- scale simulations across a wide range of polygenicity and GWAS sample sizes, ALL- Sum consistently outperformed popular alternative methods in terms of prediction accuracy, runtime, and memory usage by 10%, 20- fold, and threefold, respectively, and demonstrated robustness to diverse genetic architectures. We validated the performance of ALL- Sum in real data analysis of 11 complex traits using GWAS summary statistics from nine data sources, including the Global Lipids Genetics Consortium, Breast Cancer Association Consortium, and FinnGen Biobank, with validation in the UK Biobank. Our results show that on average, ALL- Sum obtained PRS with 25% higher accuracy on average, with 15 times faster computation and half the memory than the current state- of- the- art methods, and had robust performance across a wide range of traits and diseases. Furthermore, our method demonstrates stable prediction when using linkage disequilibrium computed from different data sources. ALL- Sum is available as a user- friendly R software package with publicly available reference data for streamlined analysis.
引用
收藏
页数:9
相关论文
共 50 条
  • [31] Prediction of Carbonation Capacity of SCMs Using Ensemble Learning Method
    Cai, Kangyi
    Liu, Jian
    Mwanza, Edward
    Fikru, Mahelet G.
    Ma, Hongyan
    Wunsch, Donald C., II
    2024 IEEE 7TH INTERNATIONAL CONFERENCE ON INDUSTRIAL CYBER-PHYSICAL SYSTEMS, ICPS 2024, 2024,
  • [32] A new method of ensemble learning: case of cryptocurrency price prediction
    Rather, Akhter Mohiuddin
    KNOWLEDGE AND INFORMATION SYSTEMS, 2023, 65 (03) : 1179 - 1197
  • [33] Ensemble Learning Method for In-Hospital Cardiac Arrest Prediction
    Koo, Ja Hyung
    Lee, Sun Jung
    Kim, Yun Kwan
    Song, Hee Seok
    MEDINFO 2023 - THE FUTURE IS ACCESSIBLE, 2024, 310 : 1462 - 1463
  • [34] Analysis and Prediction of Gestational Diabetes Mellitus by the Ensemble Learning Method
    Wang, Xiaojia
    Wang, Yurong
    Zhang, Shanshan
    Yao, Lushi
    Xu, Sheng
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2022, 15 (01)
  • [35] Analysis and Prediction of Gestational Diabetes Mellitus by the Ensemble Learning Method
    Xiaojia Wang
    Yurong Wang
    Shanshan Zhang
    Lushi Yao
    Sheng Xu
    International Journal of Computational Intelligence Systems, 15
  • [36] Protein Contact Map Prediction Based On an Ensemble Learning Method
    Habibi, Narjes Khatoon
    Saraee, Mohammad Hossein
    2009 INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND TECHNOLOGY, VOL II, PROCEEDINGS, 2009, : 205 - 209
  • [37] Integration of risk factor polygenic risk score with disease polygenic risk score for disease prediction
    Hyein Jung
    Hae-Un Jung
    Eun Ju Baek
    Shin Young Kwon
    Ji-One Kang
    Ji Eun Lim
    Bermseok Oh
    Communications Biology, 7
  • [38] Polygenic Risk Score: An Application to the Prediction of Asthma Risk
    Ricard, Jasmin
    Li, Zhonglin
    Theriault, Sebastien
    Bosse, Yohan
    Eslami, Aida
    GENETIC EPIDEMIOLOGY, 2021, 45 (07) : 785 - 785
  • [39] A Polygenic Risk Score for Prostate Cancer Risk Prediction
    Schaffer, Kerry R.
    Shi, Mingjian
    Shelley, John P.
    Tosoian, Jeffrey J.
    Kachuri, Linda
    Witte, John S.
    Mosley, Jonathan D.
    JAMA INTERNAL MEDICINE, 2023, 183 (04) : 386 - 388
  • [40] A polygenic risk score for multiple myeloma risk prediction
    Canzian, Federico
    Piredda, Chiara
    Macauda, Angelica
    Zawirska, Daria
    Andersen, Niels Frost
    Nagler, Arnon
    Zaucha, Jan Maciej
    Mazur, Grzegorz
    Dumontet, Charles
    Watek, Marzena
    Jamroziak, Krzysztof
    Sainz, Juan
    Varkonyi, Judit
    Butrym, Aleksandra
    Beider, Katia
    Abildgaard, Niels
    Lesueur, Fabienne
    Dudzinski, Marek
    Vangsted, Annette Juul
    Pelosini, Matteo
    Subocz, Edyta
    Petrini, Mario
    Buda, Gabriele
    Razny, Malgorzata
    Gemignani, Federica
    Marques, Herlander
    Orciuolo, Enrico
    Kadar, Katalin
    Jurczyszyn, Artur
    Druzd-Sitek, Agnieszka
    Vogel, Ulla
    Andersen, Vibeke
    Reis, Rui Manuel
    Suska, Anna
    Avet-Loiseau, Herve
    Kruszewski, Marcin
    Tomczak, Waldemar
    Rymko, Marcin
    Minvielle, Stephane
    Campa, Daniele
    EUROPEAN JOURNAL OF HUMAN GENETICS, 2022, 30 (04) : 474 - 479