Unsupervised machine learning for disease prediction: a comparative performance analysis using multiple datasets

被引:2
|
作者
Lu, Haohui [1 ]
Uddin, Shahadat [1 ]
机构
[1] Univ Sydney, Fac Engn, Sch Project Management, Level 2,21 Ross St, Forest Lodge, NSW 2037, Australia
关键词
Disease prediction; Performance comparison; Unsupervised machine learning; Healthcare dataset; DIAGNOSIS;
D O I
10.1007/s12553-023-00805-8
中图分类号
R-058 [];
学科分类号
摘要
PurposeDisease risk prediction poses a significant and growing challenge in the medical field. While researchers have increasingly utilised machine learning (ML) algorithms to tackle this issue, supervised ML methods remain dominant. However, there is a rising interest in unsupervised techniques, especially in situations where data labels might be missing - as seen with undiagnosed or rare diseases. This study delves into comparing unsupervised ML models for disease prediction.MethodsThis study evaluated the efficacy of seven unsupervised algorithms on 15 datasets, including those of heart failure, diabetes, and breast cancer. It used six performance metrics for this comparison. They are Adjusted Rand Index, Adjusted Mutual Information, Homogeneity, Completeness, V-measure and Silhouette Coefficient.ResultsAmong the seven unsupervised ML methods, the DBSCAN (Density-based spatial clustering of applications with noise) showed the best performance most times (31), followed by the Bayesian Gaussian Mixture (18) and Divisive clustering (15). No single model consistently outshined others across every dataset and metric. The study emphasises the crucial role of model and performance measure selections based on application-specific needs. For example, DBSCAN excels in Homogeneity, Completeness and V-measure metrics. Conversely, the Bayesian Gaussian Mixture is good in the Adjusted R and Index metric. The codes used in this study can be found at https://github.com/haohuilu/unsupervisedml/.ConclusionThis research contributes deeper insights into the unsupervised ML applications in healthcare and encourages further investigations into model selection. Subsequent studies could harness genuine disease records for a more nuanced comparison and evaluation of models.
引用
收藏
页码:141 / 154
页数:14
相关论文
共 50 条
  • [21] Performance analysis of machine learning algorithms in heart disease prediction
    Dhasaradhan, K.
    Jaichandran, R.
    CONCURRENT ENGINEERING-RESEARCH AND APPLICATIONS, 2022, 30 (04): : 335 - 343
  • [22] Heart disease prediction using supervised machine learning algorithms: Performance analysis and comparison
    Ali, Md Mamun
    Paul, Bikash Kumar
    Ahmed, Kawsar
    Bui, Francis M.
    Quinn, Julian M. W.
    Moni, Mohammad Ali
    COMPUTERS IN BIOLOGY AND MEDICINE, 2021, 136
  • [23] Prediction of Flotation Deinking Performance: A Comparative Analysis of Machine Learning Techniques
    Gavrilović, Tamara
    Despotović, Vladimir
    Zot, Madalina-Ileana
    Trumić, Maja S.
    Applied Sciences (Switzerland), 2024, 14 (19):
  • [24] Comparative Analysis of Machine Learning Models for Performance Prediction of the SPEC Benchmarks
    Tousi, Ashkan
    Lujan, Mikel
    IEEE ACCESS, 2022, 10 : 11994 - 12011
  • [25] Machine Learning Techniques for Heart Disease Prediction: A Comparative Study and Analysis
    Rahul Katarya
    Sunit Kumar Meena
    Health and Technology, 2021, 11 : 87 - 97
  • [26] Machine Learning Techniques for Heart Disease Prediction: A Comparative Study and Analysis
    Katarya, Rahul
    Meena, Sunit Kumar
    HEALTH AND TECHNOLOGY, 2021, 11 (01) : 87 - 97
  • [27] Prospectivity analysis using unsupervised machine learning
    Aranha, Malcolm
    Porwal, Alok
    16TH SGA BIENNIAL MEETING, 2022, VOL 1, 2022, : 9 - 12
  • [28] Comparative Analysis of Machine Learning Classifiers on Bioinformatics and Clinical Datasets
    Ranadive, Falguni
    Surti, Akil
    Sharma, Priyanka
    PROCEEDINGS OF THE 2019 6TH INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT (INDIACOM), 2019, : 608 - 611
  • [29] Comparative performance analysis of Boruta, SHAP, and Borutashap for disease diagnosis: A study with multiple machine learning algorithms
    Ejiyi, Chukwuebuka Joseph
    Qin, Zhen
    Ukwuoma, Chiagoziem Chima
    Nneji, Grace Ugochi
    Monday, Happy Nkanta
    Ejiyi, Makuachukwu Bennedith
    Ejiyi, Thomas Ugochukwu
    Okechukwu, Uchenna
    Bamisile, Olusola O.
    NETWORK-COMPUTATION IN NEURAL SYSTEMS, 2024,
  • [30] Analysis and Prediction of Diabetes Disease Using Machine Learning Methods
    Samet, Sarra
    Laouar, Mohamed Ridda
    Bendib, Issam
    Eom, Sean
    INTERNATIONAL JOURNAL OF DECISION SUPPORT SYSTEM TECHNOLOGY, 2022, 14 (01)