Unsupervised machine learning for disease prediction: a comparative performance analysis using multiple datasets

被引:2
|
作者
Lu, Haohui [1 ]
Uddin, Shahadat [1 ]
机构
[1] Univ Sydney, Fac Engn, Sch Project Management, Level 2,21 Ross St, Forest Lodge, NSW 2037, Australia
关键词
Disease prediction; Performance comparison; Unsupervised machine learning; Healthcare dataset; DIAGNOSIS;
D O I
10.1007/s12553-023-00805-8
中图分类号
R-058 [];
学科分类号
摘要
PurposeDisease risk prediction poses a significant and growing challenge in the medical field. While researchers have increasingly utilised machine learning (ML) algorithms to tackle this issue, supervised ML methods remain dominant. However, there is a rising interest in unsupervised techniques, especially in situations where data labels might be missing - as seen with undiagnosed or rare diseases. This study delves into comparing unsupervised ML models for disease prediction.MethodsThis study evaluated the efficacy of seven unsupervised algorithms on 15 datasets, including those of heart failure, diabetes, and breast cancer. It used six performance metrics for this comparison. They are Adjusted Rand Index, Adjusted Mutual Information, Homogeneity, Completeness, V-measure and Silhouette Coefficient.ResultsAmong the seven unsupervised ML methods, the DBSCAN (Density-based spatial clustering of applications with noise) showed the best performance most times (31), followed by the Bayesian Gaussian Mixture (18) and Divisive clustering (15). No single model consistently outshined others across every dataset and metric. The study emphasises the crucial role of model and performance measure selections based on application-specific needs. For example, DBSCAN excels in Homogeneity, Completeness and V-measure metrics. Conversely, the Bayesian Gaussian Mixture is good in the Adjusted R and Index metric. The codes used in this study can be found at https://github.com/haohuilu/unsupervisedml/.ConclusionThis research contributes deeper insights into the unsupervised ML applications in healthcare and encourages further investigations into model selection. Subsequent studies could harness genuine disease records for a more nuanced comparison and evaluation of models.
引用
收藏
页码:141 / 154
页数:14
相关论文
共 50 条
  • [41] Recurrent Stroke Prediction using Machine Learning Algorithms with Clinical Public Datasets: An Empirical Performance Evaluation
    Hassan, Fadratul Hafinaz
    Omar, Mohd Adib
    BAGHDAD SCIENCE JOURNAL, 2021, 18 (04) : 1406 - 1412
  • [42] A comparative analysis of machine learning approaches to gap filling meteorological datasets
    Branislava Lalic
    Adam Stapleton
    Thomas Vergauwen
    Steven Caluwaerts
    Elke Eichelmann
    Mark Roantree
    Environmental Earth Sciences, 2024, 83 (24)
  • [43] Effects of Different Training Datasets on Machine Learning Models for Pavement Performance Prediction
    Aranha, Ana Luisa
    Bernucci, Liedi Legi Bariani
    Vasconcelos, Kamilla L.
    TRANSPORTATION RESEARCH RECORD, 2023, 2677 (08) : 196 - 206
  • [44] Exploring Early Prediction of Chronic Kidney Disease Using Machine Learning Algorithms for Small and Imbalanced Datasets
    da Silveira, Andressa C. M.
    Sobrinho, Alvaro
    da Silva, Leandro Dias
    Costa, Evandro de Barros
    Pinheiro, Maria Eliete
    Perkusich, Angelo
    APPLIED SCIENCES-BASEL, 2022, 12 (07):
  • [45] Defect Prediction on Unlabeled Datasets by Using Unsupervised Clustering
    Yang, Jun
    Qian, Hongbing
    PROCEEDINGS OF 2016 IEEE 18TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS; IEEE 14TH INTERNATIONAL CONFERENCE ON SMART CITY; IEEE 2ND INTERNATIONAL CONFERENCE ON DATA SCIENCE AND SYSTEMS (HPCC/SMARTCITY/DSS), 2016, : 465 - 472
  • [46] Performance Analysis of Diabetic Retinopathy Prediction using Machine Learning Models
    Emon, Minhaz Uddin
    Zannat, Raihana
    Khatun, Tania
    Rahman, Mahfujur
    Keya, Maria Sultana
    Ohidujjaman
    PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTATION TECHNOLOGIES (ICICT 2021), 2021, : 1048 - 1052
  • [47] Quantitative Analysis and Prediction of Academic Performance of Students Using Machine Learning
    Zhao, Lihong
    Ren, Jiaolong
    Zhang, Lin
    Zhao, Hongbo
    SUSTAINABILITY, 2023, 15 (16)
  • [48] Machine Learning Based Phishing Attacks Detection Using Multiple Datasets
    Aljammal, Ashraf H.
    taamneh, Salah
    Qawasmeh, Ahmad
    Salameh, Hani Bani
    International Journal of Interactive Mobile Technologies, 2023, 17 (05): : 71 - 83
  • [49] Comparative Analysis of Building Insurance Prediction Using Some Machine Learning Algorithms
    Ejiyi, Chukwuebuka Joseph
    Qin, Zhen
    Salako, Abdulhaq Adetunji
    Happy, Monday Nkanta
    Nneji, Grace Ugochi
    Ukwuoma, Chiagoziem Chima
    Chikwendu, Ijeoma Amuche
    Gen, Ji
    INTERNATIONAL JOURNAL OF INTERACTIVE MULTIMEDIA AND ARTIFICIAL INTELLIGENCE, 2022, 7 (03): : 75 - 85
  • [50] Comparative Analysis of Breast and Prostate Cancer Prediction Using Machine Learning Techniques
    Rani, Samta
    Ahmad, Tanvir
    Masood, Sarfaraz
    INTERNATIONAL CONFERENCE ON INNOVATIVE COMPUTING AND COMMUNICATIONS, ICICC 2022, VOL 1, 2023, 473 : 643 - 650