Unsupervised machine learning for disease prediction: a comparative performance analysis using multiple datasets

被引：2

作者：

Lu, Haohui ^{[1
]}

Uddin, Shahadat ^{[1
]}

机构：

[1] Univ Sydney, Fac Engn, Sch Project Management, Level 2,21 Ross St, Forest Lodge, NSW 2037, Australia

来源：

HEALTH AND TECHNOLOGY | 2024年 / 14卷 / 01期

关键词：

Disease prediction; Performance comparison; Unsupervised machine learning; Healthcare dataset; DIAGNOSIS;

D O I：

10.1007/s12553-023-00805-8

中图分类号：

R-058 [];

学科分类号：

摘要：

PurposeDisease risk prediction poses a significant and growing challenge in the medical field. While researchers have increasingly utilised machine learning (ML) algorithms to tackle this issue, supervised ML methods remain dominant. However, there is a rising interest in unsupervised techniques, especially in situations where data labels might be missing - as seen with undiagnosed or rare diseases. This study delves into comparing unsupervised ML models for disease prediction.MethodsThis study evaluated the efficacy of seven unsupervised algorithms on 15 datasets, including those of heart failure, diabetes, and breast cancer. It used six performance metrics for this comparison. They are Adjusted Rand Index, Adjusted Mutual Information, Homogeneity, Completeness, V-measure and Silhouette Coefficient.ResultsAmong the seven unsupervised ML methods, the DBSCAN (Density-based spatial clustering of applications with noise) showed the best performance most times (31), followed by the Bayesian Gaussian Mixture (18) and Divisive clustering (15). No single model consistently outshined others across every dataset and metric. The study emphasises the crucial role of model and performance measure selections based on application-specific needs. For example, DBSCAN excels in Homogeneity, Completeness and V-measure metrics. Conversely, the Bayesian Gaussian Mixture is good in the Adjusted R and Index metric. The codes used in this study can be found at https://github.com/haohuilu/unsupervisedml/.ConclusionThis research contributes deeper insights into the unsupervised ML applications in healthcare and encourages further investigations into model selection. Subsequent studies could harness genuine disease records for a more nuanced comparison and evaluation of models.

引用

页码：141 / 154

页数：14

共 50 条

[21] Performance analysis of machine learning algorithms in heart disease prediction
Dhasaradhan, K.
Jaichandran, R.
CONCURRENT ENGINEERING-RESEARCH AND APPLICATIONS, 2022, 30 (04): : 335 - 343
[22] Heart disease prediction using supervised machine learning algorithms: Performance analysis and comparison
Ali, Md Mamun
Paul, Bikash Kumar
Ahmed, Kawsar
Bui, Francis M.
Quinn, Julian M. W.
Moni, Mohammad Ali
COMPUTERS IN BIOLOGY AND MEDICINE, 2021, 136
[23] Prediction of Flotation Deinking Performance: A Comparative Analysis of Machine Learning Techniques
Gavrilović, Tamara
Despotović, Vladimir
Zot, Madalina-Ileana
Trumić, Maja S.
Applied Sciences (Switzerland), 2024, 14 (19):
[24] Comparative Analysis of Machine Learning Models for Performance Prediction of the SPEC Benchmarks
Tousi, Ashkan
Lujan, Mikel
IEEE ACCESS, 2022, 10 : 11994 - 12011
[25] Machine Learning Techniques for Heart Disease Prediction: A Comparative Study and Analysis
Rahul Katarya
Sunit Kumar Meena
Health and Technology, 2021, 11 : 87 - 97
[26] Machine Learning Techniques for Heart Disease Prediction: A Comparative Study and Analysis
Katarya, Rahul
Meena, Sunit Kumar
HEALTH AND TECHNOLOGY, 2021, 11 (01) : 87 - 97
[27] Prospectivity analysis using unsupervised machine learning
Aranha, Malcolm
Porwal, Alok
16TH SGA BIENNIAL MEETING, 2022, VOL 1, 2022, : 9 - 12
[28] Comparative Analysis of Machine Learning Classifiers on Bioinformatics and Clinical Datasets
Ranadive, Falguni
Surti, Akil
Sharma, Priyanka
PROCEEDINGS OF THE 2019 6TH INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT (INDIACOM), 2019, : 608 - 611
[29] Comparative performance analysis of Boruta, SHAP, and Borutashap for disease diagnosis: A study with multiple machine learning algorithms
Ejiyi, Chukwuebuka Joseph
Qin, Zhen
Ukwuoma, Chiagoziem Chima
Nneji, Grace Ugochi
Monday, Happy Nkanta
Ejiyi, Makuachukwu Bennedith
Ejiyi, Thomas Ugochukwu
Okechukwu, Uchenna
Bamisile, Olusola O.
NETWORK-COMPUTATION IN NEURAL SYSTEMS, 2024,
[30] Analysis and Prediction of Diabetes Disease Using Machine Learning Methods
Samet, Sarra
Laouar, Mohamed Ridda
Bendib, Issam
Eom, Sean
INTERNATIONAL JOURNAL OF DECISION SUPPORT SYSTEM TECHNOLOGY, 2022, 14 (01)

← 1 2 3 4 5 →