Boosting predictive models and augmenting patient data with relevant genomic and pathway information

被引:0
|
作者
Buosi S. [1 ]
Timilsina M. [1 ]
Torrente M. [4 ]
Provencio M. [4 ]
Fey D. [5 ]
Nováček V. [1 ,2 ,3 ]
机构
[1] Data Science Institute, University of Galway, University Road, Co. Galway
[2] Faculty of Informatics, Masaryk University, Botanická 68a
[3] Masaryk Memorial Cancer Institute, Žlutý kopec 7
[4] Medical Oncology Department, Hospital Universitario Puerta de Hierro Majadahonda, C. Joaquín Rodrigo, 1, Madrid, Majadahonda
[5] Systems Biology Ireland, University College Dublin, Co. Dublin
基金
爱尔兰科学基金会; 欧盟地平线“2020”;
关键词
Knowledge graph embedding; Link prediction; Machine learning; Non-small-cell lung cancer; Tumor recurrence prediction;
D O I
10.1016/j.compbiomed.2024.108398
中图分类号
学科分类号
摘要
The recurrence of low-stage lung cancer poses a challenge due to its unpredictable nature and diverse patient responses to treatments. Personalized care and patient outcomes heavily rely on early relapse identification, yet current predictive models, despite their potential, lack comprehensive genetic data. This inadequacy fuels our research focus—integrating specific genetic information, such as pathway scores, into clinical data. Our aim is to refine machine learning models for more precise relapse prediction in early-stage non-small cell lung cancer. To address the scarcity of genetic data, we employ imputation techniques, leveraging publicly available datasets such as The Cancer Genome Atlas (TCGA), integrating pathway scores into our patient cohort from the Cancer Long Survivor Artificial Intelligence Follow-up (CLARIFY) project. Through the integration of imputed pathway scores from the TCGA dataset with clinical data, our approach achieves notable strides in predicting relapse among a held-out test set of 200 patients. By training machine learning models on enriched knowledge graph data, inclusive of triples derived from pathway score imputation, we achieve a promising precision of 82% and specificity of 91%. These outcomes highlight the potential of our models as supplementary tools within tumour, node, and metastasis (TNM) classification systems, offering improved prognostic capabilities for lung cancer patients. In summary, our research underscores the significance of refining machine learning models for relapse prediction in early-stage non-small cell lung cancer. Our approach, centered on imputing pathway scores and integrating them with clinical data, not only enhances predictive performance but also demonstrates the promising role of machine learning in anticipating relapse and ultimately elevating patient outcomes. © 2024 The Authors
引用
收藏
相关论文
共 36 条
  • [1] A General Kernel Boosting Framework Integrating Pathways for Predictive Modeling Based on Genomic Data
    Zeng, Li
    Yu, Zhaolong
    Zhang, Yiliang
    Zhao, Hongyu
    13TH ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND HEALTH INFORMATICS, BCB 2022, 2022,
  • [2] A Pathway-Based Kernel Boosting Method for Sample Classification Using Genomic Data
    Zeng, Li
    Yu, Zhaolong
    Zhao, Hongyu
    GENES, 2019, 10 (09)
  • [3] Predictive models for the effectiveness of data fusion in information retrieval
    Ng, KB
    NATIONAL ONLINE MEETING, PROCEEDINGS 2000, 2000, : 291 - 302
  • [4] Statistical representation models for mutation information within genomic data
    N. Özlem ÖZCAN ŞİMŞEK
    Arzucan ÖZGÜR
    Fikret GÜRGEN
    BMC Bioinformatics, 20
  • [5] Statistical representation models for mutation information within genomic data
    Ozcan Simsek, N. Ozlem
    Ozgur, Arzucan
    Gurgen, Fikret
    BMC BIOINFORMATICS, 2019, 20 (1)
  • [6] Incorporating pathway information into boosting estimation of high-dimensional risk prediction models
    Harald Binder
    Martin Schumacher
    BMC Bioinformatics, 10
  • [7] Incorporating pathway information into boosting estimation of high-dimensional risk prediction models
    Binder, Harald
    Schumacher, Martin
    BMC BIOINFORMATICS, 2009, 10
  • [8] Nonparametric pathway-based regression models for analysis of genomic data
    Wei, Zhi
    Li, Hongzhe
    BIOSTATISTICS, 2007, 8 (02) : 265 - 284
  • [9] Novel approaches for improving interpretation and predictive models of comparative genomic hybridization data
    Rotroff, Daniel
    Breen, Matthew
    Motsinger-Reif, Alison
    CANCER RESEARCH, 2015, 75
  • [10] Learning patient-specific predictive models from clinical data
    Visweswaran, Shyam
    Angus, Derek C.
    Hsieh, Margaret
    Weissfeld, Lisa
    Yealy, Donald
    Cooper, Gregory F.
    JOURNAL OF BIOMEDICAL INFORMATICS, 2010, 43 (05) : 669 - 685