Stratifying risk of disease in haematuria patients using machine learning techniques to improve diagnostics

被引:0
|
作者
Drozdz, Anna [1 ]
Duggan, Brian [2 ]
Ruddock, Mark W. [3 ]
Reid, Cherith N. [3 ]
Kurth, Mary Jo [3 ]
Watt, Joanne [3 ]
Irvine, Allister [3 ]
Lamont, John [3 ]
Fitzgerald, Peter [3 ]
O'Rourke, Declan [4 ]
Curry, David [4 ]
Evans, Mark [4 ]
Boyd, Ruth [5 ]
Sousa, Jose [1 ,6 ]
机构
[1] Int Res Fdn, Sano Ctr Computat Personalised Med, Personal Hlth Data Sci Grp, Krakow, Poland
[2] Ulster Hosp Dundonald, South Eastern Hlth & Social Care Trust, Belfast, North Ireland
[3] Randox Labs Ltd, Clin Studies Grp, Crumlin, Antrim, England
[4] Belfast City Hosp, Belfast Hlth & Social Care Trust, Belfast, North Ireland
[5] Belfast City Hosp, Northern Ireland Clin Trials Network, Belfast, North Ireland
[6] Queens Univ, Inst Clin Sci, Ctr Publ Hlth, Belfast, North Ireland
来源
FRONTIERS IN ONCOLOGY | 2024年 / 14卷
关键词
biomarkers; bladder cancer; haematuria; machine learning; stratification; decision support system; unbalanced data; PROSTATE-SPECIFIC ANTIGEN; ASYMPTOMATIC MICROSCOPIC HEMATURIA; BLADDER-CANCER DIAGNOSIS; SERUM CYSTATIN-C; REFERENCE RANGES; ACTIVATION; URINE; EPIDEMIOLOGY; INTELLIGENCE; FINASTERIDE;
D O I
10.3389/fonc.2024.1401071
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
Background Detailed and invasive clinical investigations are required to identify the causes of haematuria. Highly unbalanced patient population (predominantly male) and a wide range of potential causes make the ability to correctly classify patients and identify patient-specific biomarkers a major challenge. Studies have shown that it is possible to improve the diagnosis using multi-marker analysis, even in unbalanced datasets, by applying advanced analytical methods. Here, we applied several machine learning algorithms to classify patients from the haematuria patient cohort (HaBio) by analysing multiple biomarkers and to identify the most relevant ones.Materials and methods We applied several classification and feature selection methods (k-means clustering, decision trees, random forest with LIME explainer and CACTUS algorithm) to stratify patients into two groups: healthy (with no clear cause of haematuria) or sick (with an identified cause of haematuria e.g., bladder cancer, or infection). The classification performance of the models was compared. Biomarkers identified as important by the algorithms were also analysed in relation to their involvement in the pathological processes.Results Results showed that a high unbalance in the datasets significantly affected the classification by random forest and decision trees, leading to the overestimation of the sick class and low model performance. CACTUS algorithm was more robust to the unbalance in the dataset. CACTUS obtained a balanced accuracy of 0.747 for both genders, 0.718 for females and 0.803 for males. The analysis showed that in the classification process for the whole dataset: microalbumin, male gender, and tPSA emerged as the most informative biomarkers. For males: age, microalbumin, tPSA, cystatin C, BTA, HAD and S100A4 were the most significant biomarkers while for females microalbumin, IL-8, pERK, and CXCL16.Conclusions CACTUS algorithm demonstrated improved performance compared with other methods such as decision trees and random forest. Additionally, we identified the most relevant biomarkers for the specific patient group, which could be considered in the future as novel biomarkers for diagnosis. Our results have the potential to inform future research and provide new personalised diagnostic approaches tailored directly to the needs of the individuals.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Stratifying the Risk of Cardiovascular Disease in Obstructive Sleep Apnea Using Machine Learning
    Gourishetti, Saikrishna C.
    Taylor, Rodney
    Isaiah, Amal
    [J]. LARYNGOSCOPE, 2022, 132 (01): : 234 - 241
  • [2] Risk Prediction of Diabetic Disease Using Machine Learning Techniques
    Tamanna
    Kumari, Ritika
    Bansal, Poonam
    Dev, Amita
    [J]. SMART TRENDS IN COMPUTING AND COMMUNICATIONS, VOL 1, SMARTCOM 2024, 2024, 945 : 197 - 209
  • [3] Risk Stratifying Indeterminate Thyroid Nodules With Machine Learning
    Luong, George
    Idarraga, Alexander J.
    Hsiao, Vivian
    Schneider, David F.
    [J]. JOURNAL OF SURGICAL RESEARCH, 2022, 270 : 214 - 220
  • [4] A Machine Learning Algorithm to Improve Risk Assessment for Patients with Sickle Cell Disease
    Sachdev, Vandana
    Gu, Yuan
    Nichols, James
    Li, Wen
    Sidenk, Stanislav
    Allen, Darlene
    Wu, Colin
    Thein, Swee Lay
    [J]. BLOOD, 2019, 134
  • [5] Using machine learning techniques to predict antimicrobial resistance in stone disease patients
    Tzelves, Lazaros
    Lazarou, Lazaros
    Feretzakis, Georgios
    Kalles, Dimitris
    Mourmouris, Panagiotis
    Loupelis, Evangelos
    Basourakos, Spyridon
    Berdempes, Marinos
    Manolitsis, Ioannis
    Mitsogiannis, Iraklis
    Skolarikos, Andreas
    Varkarakis, Ioannis
    [J]. WORLD JOURNAL OF UROLOGY, 2022, 40 (07) : 1731 - 1736
  • [6] Using machine learning techniques to predict antimicrobial resistance in stone disease patients
    Lazaros Tzelves
    Lazaros Lazarou
    Georgios Feretzakis
    Dimitris Kalles
    Panagiotis Mourmouris
    Evangelos Loupelis
    Spyridon Basourakos
    Marinos Berdempes
    Ioannis Manolitsis
    Iraklis Mitsogiannis
    Andreas Skolarikos
    Ioannis Varkarakis
    [J]. World Journal of Urology, 2022, 40 : 1731 - 1736
  • [7] Credit Risk Analysis Using Machine Learning Techniques
    Shiv, S. J.
    Murthy, Srinivasa
    Challuru, Krishnaprasad
    [J]. 2018 FOURTEENTH INTERNATIONAL CONFERENCE ON INFORMATION PROCESSING (ICINPRO) - 2018, 2018, : 214 - 218
  • [8] Stratifying individuals into non-alcoholic fatty liver disease risk levels using time series machine learning models
    Ben-Assuli, Ofir
    Jacobi, Arie
    Goldman, Orit
    Shenhar-Tsarfaty, Shani
    Rogowski, Ori
    Zeltser, David
    Shapira, Itzhak
    Berliner, Shlomo
    Zelber-Sagi, Shira
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2022, 126
  • [9] Skin Disease Classification Using Machine Learning Techniques
    Abir, Mohammad Ashraful Haque
    Anik, Golam Kibria
    Riam, Shazid Hasan
    Karim, Mohammed Ariful
    Tareq, Azizul Hakim
    Rahman, Rashedur M.
    [J]. ADVANCES IN COMPUTATIONAL INTELLIGENCE, IWANN 2021, PT I, 2021, 12861 : 597 - 608
  • [10] Heart Disease Prediction using Machine Learning Techniques
    Shah D.
    Patel S.
    Bharti S.K.
    [J]. SN Computer Science, 2020, 1 (6)