Fast and scalable ensemble learning method for versatile polygenic risk prediction

被引:0
|
作者
Chen, Tony [1 ]
Zhang, Haoyu [2 ]
Mazumder, Rahul [3 ]
Lin, Xihong [1 ,4 ]
机构
[1] Harvard TH Chan Sch Publ Hlth, Dept Biostat, Boston, MA 02215 USA
[2] NCI, Div Canc Epidemiol & Genet, Bethesda, MD 20814 USA
[3] MIT, Sloan Sch Management, Operat Res & Stat Grp, Cambridge, MA 02139 USA
[4] Harvard Univ, Dept Stat, Cambridge, MA 02138 USA
关键词
polygenic risk scores; ensemble learning; L0Learn; penalized regression; LINKAGE DISEQUILIBRIUM; SELECTION; REGRESSION; ACCURACY; DISEASE; MODELS; REGULARIZATION; ASSOCIATION; INSIGHTS; SCORE;
D O I
10.1073/pnas.2403210121
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Polygenic risk scores (PRS) enhance population risk stratification and advance personalized medicine, but existing methods face several limitations, encompassing issues related to computational burden, predictive accuracy, and adaptability to a wide range of genetic architectures. To address these issues, we propose Aggregated L0Learn using Summary- level data (ALL- Sum), a fast and scalable ensemble learning method for computing PRS using summary statistics from genome-wide association studies (GWAS). ALL- Sum leverages a L0L2 penalized regression and ensemble learning across tuning parameters to flexibly model traits with diverse genetic architectures. In extensive large- scale simulations across a wide range of polygenicity and GWAS sample sizes, ALL- Sum consistently outperformed popular alternative methods in terms of prediction accuracy, runtime, and memory usage by 10%, 20- fold, and threefold, respectively, and demonstrated robustness to diverse genetic architectures. We validated the performance of ALL- Sum in real data analysis of 11 complex traits using GWAS summary statistics from nine data sources, including the Global Lipids Genetics Consortium, Breast Cancer Association Consortium, and FinnGen Biobank, with validation in the UK Biobank. Our results show that on average, ALL- Sum obtained PRS with 25% higher accuracy on average, with 15 times faster computation and half the memory than the current state- of- the- art methods, and had robust performance across a wide range of traits and diseases. Furthermore, our method demonstrates stable prediction when using linkage disequilibrium computed from different data sources. ALL- Sum is available as a user- friendly R software package with publicly available reference data for streamlined analysis.
引用
收藏
页数:9
相关论文
共 50 条
  • [41] Polygenic risk scores and breast cancer risk prediction
    Roberts, Eleanor
    Howell, Sacha
    Evans, D. Gareth
    BREAST, 2023, 67 : 71 - 77
  • [42] A polygenic risk score for multiple myeloma risk prediction
    Federico Canzian
    Chiara Piredda
    Angelica Macauda
    Daria Zawirska
    Niels Frost Andersen
    Arnon Nagler
    Jan Maciej Zaucha
    Grzegorz Mazur
    Charles Dumontet
    Marzena Wątek
    Krzysztof Jamroziak
    Juan Sainz
    Judit Várkonyi
    Aleksandra Butrym
    Katia Beider
    Niels Abildgaard
    Fabienne Lesueur
    Marek Dudziński
    Annette Juul Vangsted
    Matteo Pelosini
    Edyta Subocz
    Mario Petrini
    Gabriele Buda
    Małgorzata Raźny
    Federica Gemignani
    Herlander Marques
    Enrico Orciuolo
    Katalin Kadar
    Artur Jurczyszyn
    Agnieszka Druzd-Sitek
    Ulla Vogel
    Vibeke Andersen
    Rui Manuel Reis
    Anna Suska
    Hervé Avet-Loiseau
    Marcin Kruszewski
    Waldemar Tomczak
    Marcin Rymko
    Stephane Minvielle
    Daniele Campa
    European Journal of Human Genetics, 2022, 30 : 474 - 479
  • [43] Deep learning-based polygenic risk analysis for Alzheimer's disease prediction
    Zhou, Xiaopu
    Chen, Yu
    Ip, Fanny C. F.
    Jiang, Yuanbing
    Cao, Han
    Lv, Ge
    Zhong, Huan
    Chen, Jiahang
    Ye, Tao
    Chen, Yuewen
    Zhang, Yulin
    Ma, Shuangshuang
    Lo, Ronnie M. N.
    Tong, Estella P. S.
    Mok, Vincent C. T.
    Kwok, Timothy C. Y.
    Guo, Qihao
    Mok, Kin Y.
    Shoai, Maryam
    Hardy, John
    Chen, Lei
    Fu, Amy K. Y.
    Ip, Nancy Y.
    COMMUNICATIONS MEDICINE, 2023, 3 (01):
  • [44] A Text Mining and Ensemble Learning Based Approach for Credit Risk Prediction
    Mao, Yang
    Liu, Shifeng
    Gong, Daqing
    TEHNICKI VJESNIK-TECHNICAL GAZETTE, 2023, 30 (01): : 138 - 147
  • [45] The trauma severity model: An ensemble machine learning approach to risk prediction
    Gorczyca, Michael T.
    Toscano, Nicole C.
    Cheng, Julius D.
    COMPUTERS IN BIOLOGY AND MEDICINE, 2019, 108 : 9 - 19
  • [46] HeartEnsembleNet: An Innovative Hybrid Ensemble Learning Approach for Cardiovascular Risk Prediction
    Zaidi, Syed Ali Jafar
    Ghafoor, Attia
    Kim, Jun
    Abbas, Zeeshan
    Lee, Seung Won
    HEALTHCARE, 2025, 13 (05)
  • [47] Prediction of Cervical Cancer Basing on Risk Factors using Ensemble Learning
    Ahishakiye, Emmanuel
    Wario, Ruth
    Mwangi, Waweru
    Taremwa, Danison
    2020 IST-AFRICA CONFERENCE (IST-AFRICA), 2020,
  • [48] Deep learning-based polygenic risk analysis for Alzheimer’s disease prediction
    Xiaopu Zhou
    Yu Chen
    Fanny C. F. Ip
    Yuanbing Jiang
    Han Cao
    Ge Lv
    Huan Zhong
    Jiahang Chen
    Tao Ye
    Yuewen Chen
    Yulin Zhang
    Shuangshuang Ma
    Ronnie M. N. Lo
    Estella P. S. Tong
    Vincent C. T. Mok
    Timothy C. Y. Kwok
    Qihao Guo
    Kin Y. Mok
    Maryam Shoai
    John Hardy
    Lei Chen
    Amy K. Y. Fu
    Nancy Y. Ip
    Communications Medicine, 3
  • [49] Ensemble Learning for Rainfall Prediction
    Sani N.S.
    Rahman A.H.A.
    Adam A.
    Shlash I.
    Aliff M.
    International Journal of Advanced Computer Science and Applications, 2020, 11 (11): : 153 - 162
  • [50] Ensemble Learning for Rainfall Prediction
    Sani, Nor Samsiah
    Abd Rahman, Abdul Hadi
    Adam, Afzan
    Shlash, Israa
    Aliff, Mohd
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (11) : 153 - 162