Proteomic prediction of diverse incident diseases: a machine learning-guided biomarker discovery study using data from a prospective cohort study

被引:4
|
作者
Carrasco-Zanini J. [1 ,3 ]
Pietzner M. [1 ,2 ,3 ]
Koprulu M. [1 ]
Wheeler E. [1 ]
Kerrison N.D. [1 ]
Wareham N.J. [1 ]
Langenberg C. [1 ,2 ,3 ]
机构
[1] MRC Epidemiology Unit, University of Cambridge School of Clinical Medicine, Institute of Metabolic Science, Cambridge
[2] Computational Medicine, Berlin Institute of Health at Charité-Universitätsmedizin Berlin, Berlin
[3] Precision Healthcare University Research Institute, Queen Mary University of London, London
来源
The Lancet Digital Health | 2024年 / 6卷 / 07期
基金
英国医学研究理事会; 英国科研创新办公室; 英国惠康基金;
关键词
Disease control - Health risks - Machine learning - Proteins - Quality control;
D O I
10.1016/S2589-7500(24)00087-6
中图分类号
学科分类号
摘要
Background: Broad-capture proteomic technologies have the potential to improve disease prediction, enabling targeted prevention and management, but studies have so far been limited to very few selected diseases and have not evaluated predictive performance across multiple conditions. We aimed to evaluate the potential of serum proteins to improve risk prediction over and above health-derived information and polygenic risk scores across a diverse set of 24 outcomes. Methods: We designed multiple case-cohorts nested in the EPIC-Norfolk prospective study, from participants with available serum samples and genome-wide genotype data, with more than 32 974 person-years of follow-up. Participants were middle-aged individuals (aged 40–79 years at baseline) of European ancestry who were recruited from the general population of Norfolk, England, between March, 1993 and December, 1997. We selected participants who developed one of ten less common diseases within 10 years of follow-up; we also subsampled a randomly drawn control subcohort, which also served to investigate 14 more common outcomes (n>70), including all-cause premature mortality (death before the age of 75 years; case numbers 71–437; controls 608–1556). Individuals were excluded from the current study owing to failed genotyping or proteomic quality control, relatedness, or missing information on age, sex, BMI, or smoking status. We used a machine learning framework to derive sparse predictive protein models for the onset of the the 23 individual diseases and all-cause premature mortality, and to derive a single common sparse multimorbidity signature that was predictive across multiple diseases from 2923 serum proteins. Findings: Participants who developed one of ten less common diseases within 10 years of follow-up included 482 women and 507 men, with a mean age at baseline of 64·56 years (8·08). The random subcohort included 990 women and 769 men, with a mean age of 58·79 years (9·31). As few as five proteins alone outperformed polygenic risk scores for 17 of 23 outcomes (median dfference in concordance index [C-index] 0·13 [0·10–0·17]) and improved predictive performance when added over basic patient-derived information models for seven outcomes, achieving a median C-index of 0·82 (IQR 0·77–0·82). This included diseases with poor prognosis such as lung cancer (C-index 0·85 [+/− cross-validation error 0·83–0·87]), for which we identified unreported biomarkers such as C-X-C motif chemokine ligand 17. A sparse multimorbidity signature of ten proteins improved prediction across seven outcomes over patient-derived information models, achieving performances (median C-index 0·81 [IQR 0·80–0·82]) similar to those of disease-specific signatures. Interpretation: We show the value of broad-capture proteomic biomarker discovery studies across multiple diseases of diverse causes, pointing to those that might benefit the most from proteomic approaches, and the potential to derive common sparse biomarker panels for prediction of multiple diseases at once. This framework could enable follow-up studies to explore the generalisability of proteomic models and to benchmark these against clinical assays, which are required to understand the translational potential of these findings. Funding: Medical Research Council, Health Data Research UK, UK Research and Innovation–National Institute for Health and Care Research, Cancer Research UK, and Wellcome Trust. © 2024 The Author(s). Published by Elsevier Ltd. This is an Open Access article under the CC BY 4.0 license
引用
收藏
页码:e470 / e479
页数:9
相关论文
共 50 条
  • [31] Machine learning prediction of postoperative major adverse cardiovascular events in geriatric patients: a prospective cohort study
    Xiran Peng
    Tao Zhu
    Tong Wang
    Fengjun Wang
    Ke Li
    Xuechao Hao
    BMC Anesthesiology, 22
  • [32] Machine learning guided prediction of warfarin blood levels for personalized medicine based on clinical longitudinal data from cardiac surgery patients: a prospective observational study
    Xue, Ling
    He, Shan
    Singla, Rajeev K.
    Qin, Qiong
    Ding, Yinglong
    Liu, Linsheng
    Ding, Xiaoliang
    Bediaga-Baneres, Harbil
    Arrasate, Sonia
    Durado-Sanchez, Aliuska
    Zhang, Yuzhen
    Shen, Zhenya
    Shen, Bairong
    Miao, Liyan
    Gonzalez-Diaz, Humberto
    INTERNATIONAL JOURNAL OF SURGERY, 2024, 110 (10) : 6528 - 6540
  • [33] Machine Learning Analysis of Blood microRNA Data in Major Depression: A Case-Control Study for Biomarker Discovery
    Qi, Bill
    Fiori, Laura M.
    Turecki, Gustavo
    Trakadis, Yannis J.
    INTERNATIONAL JOURNAL OF NEUROPSYCHOPHARMACOLOGY, 2020, 23 (08): : 505 - 510
  • [34] Reference evapotranspiration prediction using machine learning models: An empirical study from minimal climate data
    Kumar, Bipin
    Bisht, Himani
    Rajput, Jitendra
    Mishra, Anil Kumar
    Tm, Kiran Kumara
    Brahmanand, Pothula Srinivasa
    AGRONOMY JOURNAL, 2024, 116 (03) : 956 - 972
  • [35] A Study of Disease Prediction on Weighted Symptom Data Using Deep Learning and Machine Learning Algorithms
    Colak, Melike
    Sivri, Talya Tumer
    Akman, Nergis Pervan
    Berkol, Ali
    Ekici, Yahya
    2022 INTERNATIONAL CONFERENCE ON THEORETICAL AND APPLIED COMPUTER SCIENCE AND ENGINEERING (ICTASCE), 2022, : 116 - 119
  • [36] Predicting post-stroke cognitive impairment using machine learning: A prospective cohort study
    Ji, Wencan
    Wang, Canjun
    Chen, Hanqing
    Liang, Yan
    Wang, Shaohua
    JOURNAL OF STROKE & CEREBROVASCULAR DISEASES, 2023, 32 (11):
  • [37] A Prospective Study on Risk Prediction of Preeclampsia Using Bi-Platform Calibration and Machine Learning
    Zhao, Zhiguo
    Dai, Jiaxin
    Chen, Hongyan
    Lu, Lu
    Li, Gang
    Yan, Hua
    Zhang, Junying
    INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2024, 25 (19)
  • [38] Risk prediction of delirium in hospitalized patients using machine learning: An implementation and prospective evaluation study
    Jauk, Stefanie
    Kramer, Diether
    Grossauer, Birgit
    Rienmueller, Susanne
    Avian, Alexander
    Berghold, Andrea
    Leodolter, Werner
    Schulz, Stefan
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2020, 27 (09) : 1383 - 1392
  • [39] Machine learning improves prediction of severity and outcomes of acute pancreatitis: a prospective multi-center cohort study
    Li, Jia-Ning
    Mu, Dong
    Zheng, Shi-Cheng
    Tian, Wei
    Wu, Zuo-Yan
    Meng, Jie
    Wang, Rui-Feng
    Zheng, Tian-Lei
    Zhang, Yue-Lun
    Windsor, John
    Lu, Guo-Tao
    Wu, Dong
    SCIENCE CHINA-LIFE SCIENCES, 2023, 66 (08) : 1934 - 1937
  • [40] Machine learning improves prediction of severity and outcomes of acute pancreatitis: a prospective multi-center cohort study
    Jia-Ning Li
    Dong Mu
    Shi-Cheng Zheng
    Wei Tian
    Zuo-Yan Wu
    Jie Meng
    Rui-Feng Wang
    Tian-Lei Zheng
    Yue-Lun Zhang
    John Windsor
    Guo-Tao Lu
    Dong Wu
    Science China Life Sciences, 2023, 66 : 1934 - 1937