Fast and scalable ensemble learning method for versatile polygenic risk prediction

被引:0
|
作者
Chen, Tony [1 ]
Zhang, Haoyu [2 ]
Mazumder, Rahul [3 ]
Lin, Xihong [1 ,4 ]
机构
[1] Harvard TH Chan Sch Publ Hlth, Dept Biostat, Boston, MA 02215 USA
[2] NCI, Div Canc Epidemiol & Genet, Bethesda, MD 20814 USA
[3] MIT, Sloan Sch Management, Operat Res & Stat Grp, Cambridge, MA 02139 USA
[4] Harvard Univ, Dept Stat, Cambridge, MA 02138 USA
关键词
polygenic risk scores; ensemble learning; L0Learn; penalized regression; LINKAGE DISEQUILIBRIUM; SELECTION; REGRESSION; ACCURACY; DISEASE; MODELS; REGULARIZATION; ASSOCIATION; INSIGHTS; SCORE;
D O I
10.1073/pnas.2403210121
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Polygenic risk scores (PRS) enhance population risk stratification and advance personalized medicine, but existing methods face several limitations, encompassing issues related to computational burden, predictive accuracy, and adaptability to a wide range of genetic architectures. To address these issues, we propose Aggregated L0Learn using Summary- level data (ALL- Sum), a fast and scalable ensemble learning method for computing PRS using summary statistics from genome-wide association studies (GWAS). ALL- Sum leverages a L0L2 penalized regression and ensemble learning across tuning parameters to flexibly model traits with diverse genetic architectures. In extensive large- scale simulations across a wide range of polygenicity and GWAS sample sizes, ALL- Sum consistently outperformed popular alternative methods in terms of prediction accuracy, runtime, and memory usage by 10%, 20- fold, and threefold, respectively, and demonstrated robustness to diverse genetic architectures. We validated the performance of ALL- Sum in real data analysis of 11 complex traits using GWAS summary statistics from nine data sources, including the Global Lipids Genetics Consortium, Breast Cancer Association Consortium, and FinnGen Biobank, with validation in the UK Biobank. Our results show that on average, ALL- Sum obtained PRS with 25% higher accuracy on average, with 15 times faster computation and half the memory than the current state- of- the- art methods, and had robust performance across a wide range of traits and diseases. Furthermore, our method demonstrates stable prediction when using linkage disequilibrium computed from different data sources. ALL- Sum is available as a user- friendly R software package with publicly available reference data for streamlined analysis.
引用
收藏
页数:9
相关论文
共 50 条
  • [21] The illusion of polygenic disease risk prediction
    Wald, Nicholas J.
    Old, Robert
    GENETICS IN MEDICINE, 2019, 21 (08) : 1705 - 1707
  • [22] Integration of risk factor polygenic risk score with disease polygenic risk score for disease prediction
    Jung, Hyein
    Jung, Hae-Un
    Baek, Eun Ju
    Kwon, Shin Young
    Kang, Ji-One
    Lim, Ji Eun
    Oh, Bermseok
    COMMUNICATIONS BIOLOGY, 2024, 7 (01)
  • [23] Polygenic Risk Score Prediction for Endometriosis
    Kloeve-Mogensen, Kirstine
    Rohde, Palle Duun
    Twisttmann, Simone
    Nygaard, Marianne
    Koldby, Kristina Magaard
    Steffensen, Rudi
    Dahl, Christian Moller
    Rytter, Dorte
    Overgaard, Michael Toft
    Forman, Axel
    Christiansen, Lene
    Nyegaard, Mette
    FRONTIERS IN REPRODUCTIVE HEALTH, 2021, 3
  • [24] Polygenic risk scores: a biased prediction?
    Francisco M. De La Vega
    Carlos D. Bustamante
    Genome Medicine, 10
  • [25] BIPOLAR POLYGENIC RISK AND PERSONLEVEL PREDICTION
    Hafeman, Danella M.
    JOURNAL OF THE AMERICAN ACADEMY OF CHILD AND ADOLESCENT PSYCHIATRY, 2023, 62 (10): : S147 - S148
  • [26] A new method of ensemble learning: case of cryptocurrency price prediction
    Akhter Mohiuddin Rather
    Knowledge and Information Systems, 2023, 65 : 1179 - 1197
  • [27] A Prediction Method of Cable Crosstalk in Electronic Systems with Ensemble Learning
    Yang, Xu
    Zhou, Dejian
    Song, Wei
    She, Yulai
    Chen, Xiaoyong
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2022, 47 (03) : 2987 - 3000
  • [28] A stacked ensemble learning method for customer lifetime value prediction
    Asadi, Nader
    Kazerooni, Mehrdad
    KYBERNETES, 2024, 53 (07) : 2342 - 2360
  • [29] A Prediction Method of Cable Crosstalk in Electronic Systems with Ensemble Learning
    Xu Yang
    Dejian Zhou
    Wei Song
    Yulai She
    Xiaoyong Chen
    Arabian Journal for Science and Engineering, 2022, 47 : 2987 - 3000
  • [30] Software Defect Prediction Method Based on Clustering Ensemble Learning
    Tao, Hongwei
    Cao, Qiaoling
    Chen, Haoran
    Li, Yanting
    Niu, Xiaoxu
    Wang, Tao
    Geng, Zhenhao
    Shang, Songtao
    IET SOFTWARE, 2024, 2024