Proteomic prediction of diverse incident diseases: a machine learning-guided biomarker discovery study using data from a prospective cohort study

被引:0
|
作者
Carrasco-Zanini, Julia [1 ,3 ]
Pietzner, Maik [1 ,2 ,3 ]
Koprulu, Mine [1 ]
Wheeler, Eleanor [1 ]
Kerrison, Nicola [1 ]
Wareham, Nicholas J. [1 ]
Langenberg, Claudia [1 ,2 ,3 ,4 ]
机构
[1] Univ Cambridge, Sch Clin Med, Inst Metab Sci, MRC Epidemiol Unit, Cambridge, England
[2] Charite Univ Med Berlin, Berlin Inst Hlth, Computat Med, Berlin, Germany
[3] Queen Mary Univ London, Precis Healthcare Univ Res Inst, London, England
[4] Charite Univ Med Berlin, Berlin Inst Hlth, Computat Med, D-10117 Berlin, Germany
来源
LANCET DIGITAL HEALTH | 2024年 / 6卷 / 07期
基金
英国科研创新办公室; 英国医学研究理事会; 英国惠康基金;
关键词
PLASMA PROTEOME; LUNG-FUNCTION; CXCL17; RISK;
D O I
暂无
中图分类号
R-058 [];
学科分类号
摘要
Background Broad-capture proteomic technologies have the potential to improve disease prediction, enabling targeted prevention and management, but studies have so far been limited to very few selected diseases and have not evaluated predictive performance across multiple conditions. We aimed to evaluate the potential of serum proteins to improve risk prediction over and above health-derived information and polygenic risk scores across a diverse set of 24 outcomes. Methods We designed multiple case-cohorts nested in the EPIC-Norfolk prospective study, from participants with available serum samples and genome-wide genotype data, with more than 32 974 person -years of follow-up. Participants were middle-aged individuals (aged 40-79 years at baseline) of European ancestry who were recruited from the general population of Norfolk, England, between March, 1993 and December, 1997. We selected participants who developed one of ten less common diseases within 10 years of follow-up; we also subsampled a randomly drawn control subcohort, which also served to investigate 14 more common outcomes (n>70), including all-cause premature mortality (death before the age of 75 years; case numbers 71-437; controls 608-1556). Individuals were excluded from the current study owing to failed genotyping or proteomic quality control, relatedness, or missing information on age, sex, BMI, or smoking status. We used a machine learning framework to derive sparse predictive protein models for the onset of the the 23 individual diseases and all-cause premature mortality, and to derive a single common sparse multimorbidity signature that was predictive across multiple diseases from 2923 serum proteins. Findings Participants who developed one of ten less common diseases within 10 years of follow-up included 482 women and 507 men, with a mean age at baseline of 64<middle dot>56 years (8<middle dot>08). The random subcohort included 990 women and 769 men, with a mean age of 58<middle dot>79 years (9<middle dot>31). As few as five proteins alone outperformed polygenic risk scores for 17 of 23 outcomes (median dfference in concordance index [C-index] 0<middle dot>13 [0 <middle dot> 10-0 <middle dot> 17]) and improved predictive performance when added over basic patient-derived information models for seven outcomes, achieving a median C-index of 0<middle dot>82 (IQR 0<middle dot>77-0<middle dot>82). This included diseases with poor prognosis such as lung cancer (C-index 0<middle dot>85 [+/- cross-validation error 0<middle dot>83-0<middle dot>87]), for which we identified unreported biomarkers such as C -X -C motif chemokine ligand 17. A sparse multimorbidity signature of ten proteins improved prediction across seven outcomes over patient-derived information models, achieving performances (median C-index 0<middle dot>81 [IQR 0<middle dot>80-0<middle dot>82]) similar to those of disease-specific signatures. Interpretation We show the value of broad-capture proteomic biomarker discovery studies across multiple diseases of diverse causes, pointing to those that might benefit the most from proteomic approaches, and the potential to derive common sparse biomarker panels for prediction of multiple diseases at once. This framework could enable follow-up studies to explore the generalisability of proteomic models and to benchmark these against clinical assays, which are required to understand the translational potential of these findings. Funding Medical Research Council, Health Data Research UK, UK Research and Innovation-National Institute for Health and Care Research, Cancer Research UK, and Wellcome Trust. Copyright (c) 2024 The Author(s). Published by Elsevier Ltd. This is an Open Access article under the CC BY 4.0 license.
引用
收藏
页码:e470 / e479
页数:10
相关论文
共 50 条
  • [1] Proteomic prediction of diverse incident diseases: a machine learning-guided biomarker discovery study using data from a prospective cohort study
    Carrasco-Zanini J.
    Pietzner M.
    Koprulu M.
    Wheeler E.
    Kerrison N.D.
    Wareham N.J.
    Langenberg C.
    The Lancet Digital Health, 2024, 6 (07): : e470 - e479
  • [2] Bone texture analysis for prediction of incident radiographic hip osteoarthritis using machine learning: data from the Cohort Hip and Cohort Knee (CHECK) study
    Hirvasniemi, J.
    Gielis, W. P.
    Arbabi, S.
    Agricola, R.
    van Spil, W. E.
    Arbabi, V.
    Weinans, H.
    OSTEOARTHRITIS AND CARTILAGE, 2019, 27 (06) : 906 - 914
  • [3] Severity prediction markers in dengue: a prospective cohort study using machine learning approach
    Jean Pierre, Aashika Raagavi
    Green, Siva Ranganathan
    Anandaraj, Lokeshmaran
    Sivaprakasam, Manikandan
    Kasirajan, Anand
    Devaraju, Panneer
    Anumulapuri, Srilekha
    Mutheneni, Srinivasa Rao
    Balakrishna Pillai, Agieshkumar
    BIOMARKERS, 2024, 29 (08) : 557 - 564
  • [4] Data Leakage in Health Outcomes Prediction With Machine Learning. Comment on "Prediction of Incident Hypertension Within the Next Year: Prospective Study Using Statewide Electronic Health Records and Machine Learning"
    Chiavegatto Filho, Alexandre
    De Moraes Batista, Andre Filipe
    dos Santos, Hellen Geremias
    JOURNAL OF MEDICAL INTERNET RESEARCH, 2021, 23 (02)
  • [5] Clinical applications of machine learning for prediction of incident atrial fibrillation from the general population: a nationwide cohort study
    Kim, I. -S.
    Yang, P. S.
    Yu, H. T.
    Kim, T. H.
    Uhm, J. S.
    Pak, H. N.
    Lee, M. H.
    Kim, J. Y.
    Joung, B.
    EUROPEAN HEART JOURNAL, 2019, 40 : 3579 - 3579
  • [6] Stroke risk prediction using machine learning: a prospective cohort study of 0.5 million Chinese adults
    Chun, Matthew
    Clarke, Robert
    Cairns, Benjamin J.
    Clifton, David
    Bennett, Derrick
    Chen, Yiping
    Guo, Yu
    Pei, Pei
    Lv, Jun
    Yu, Canqing
    Yang, Ling
    Li, Liming
    Chen, Zhengming
    Zhu, Tingting
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2021, 28 (08) : 1719 - 1727
  • [7] Prediction of Incident Hypertension Within the Next Year: Prospective Study Using Statewide Electronic Health Records and Machine Learning
    Ye, Chengyin
    Fu, Tianyun
    Hao, Shiying
    Zhang, Yan
    Wang, Oliver
    Jin, Bo
    Xia, Minjie
    Liu, Modi
    Zhou, Xin
    Wu, Qian
    Guo, Yanting
    Zhu, Chunqing
    Li, Yu-Ming
    Culver, Devore S.
    Alfreds, Shaun T.
    Stearns, Frank
    Sylvester, Karl G.
    Widen, Eric
    McElhinney, Doff
    Ling, Xuefeng
    JOURNAL OF MEDICAL INTERNET RESEARCH, 2018, 20 (01)
  • [8] Prediction of Autism Risk From Family Medical History Data Using Machine Learning: A National Cohort Study From Denmark
    Ejlskov, Linda
    Wulff, Jesper N.
    Kalkbrenner, Amy
    Ladd-Acosta, Christine
    Fallin, M. Danielle
    Agerbo, Esben
    Mortensen, Preben Bo
    Lee, Brian K.
    Schendel, Diana
    BIOLOGICAL PSYCHIATRY: GLOBAL OPEN SCIENCE, 2021, 1 (02): : 156 - 164
  • [9] Prediction of Maternal Hemorrhage Using Machine Learning: Retrospective Cohort Study
    Westcott, Jill M.
    Hughes, Francine
    Liu, Wenke
    Grivainis, Mark
    Hoskins, Iffath
    Fenyo, David
    JOURNAL OF MEDICAL INTERNET RESEARCH, 2022, 24 (07)
  • [10] Prediction of metastatic pheochromocytoma and paraganglioma: a machine learning modelling study using data from a cross-sectional cohort
    Pamporaki, Christina
    Berends, Annika M. A.
    Filippatos, Angelos
    Prodanov, Tamara
    Meuter, Leah
    Prejbisz, Alexander
    Beuschlein, Felix
    Fassnacht, Martin
    Timmers, Henri J. L. M.
    Noelting, Svenja
    Abhyankar, Kaushik
    Constantinescu, Georgiana
    Kunath, Carola
    de Haas, Robbert J.
    Wang, Katharina
    Remde, Hanna
    Bornstein, Stefan R.
    Januszewicz, Andrzeij
    Robledo, Mercedes
    Lenders, Jacques W. M.
    Kerstens, Michiel N.
    Pacak, Karel
    Eisenhofer, Graeme
    LANCET DIGITAL HEALTH, 2023, 5 (09): : E551 - E559