Unsupervised machine learning for disease prediction: a comparative performance analysis using multiple datasets

被引:2
|
作者
Lu, Haohui [1 ]
Uddin, Shahadat [1 ]
机构
[1] Univ Sydney, Fac Engn, Sch Project Management, Level 2,21 Ross St, Forest Lodge, NSW 2037, Australia
关键词
Disease prediction; Performance comparison; Unsupervised machine learning; Healthcare dataset; DIAGNOSIS;
D O I
10.1007/s12553-023-00805-8
中图分类号
R-058 [];
学科分类号
摘要
PurposeDisease risk prediction poses a significant and growing challenge in the medical field. While researchers have increasingly utilised machine learning (ML) algorithms to tackle this issue, supervised ML methods remain dominant. However, there is a rising interest in unsupervised techniques, especially in situations where data labels might be missing - as seen with undiagnosed or rare diseases. This study delves into comparing unsupervised ML models for disease prediction.MethodsThis study evaluated the efficacy of seven unsupervised algorithms on 15 datasets, including those of heart failure, diabetes, and breast cancer. It used six performance metrics for this comparison. They are Adjusted Rand Index, Adjusted Mutual Information, Homogeneity, Completeness, V-measure and Silhouette Coefficient.ResultsAmong the seven unsupervised ML methods, the DBSCAN (Density-based spatial clustering of applications with noise) showed the best performance most times (31), followed by the Bayesian Gaussian Mixture (18) and Divisive clustering (15). No single model consistently outshined others across every dataset and metric. The study emphasises the crucial role of model and performance measure selections based on application-specific needs. For example, DBSCAN excels in Homogeneity, Completeness and V-measure metrics. Conversely, the Bayesian Gaussian Mixture is good in the Adjusted R and Index metric. The codes used in this study can be found at https://github.com/haohuilu/unsupervisedml/.ConclusionThis research contributes deeper insights into the unsupervised ML applications in healthcare and encourages further investigations into model selection. Subsequent studies could harness genuine disease records for a more nuanced comparison and evaluation of models.
引用
收藏
页码:141 / 154
页数:14
相关论文
共 50 条
  • [31] Performance Evaluation of Pipe Break Machine Learning Models Using Datasets from Multiple Utilities
    Chen, Thomas Ying-Jeh
    Vladeanu, Greta
    Yazdekhasti, Sepideh
    Daly, Craig Michael
    JOURNAL OF INFRASTRUCTURE SYSTEMS, 2022, 28 (02)
  • [32] Qanat discharge prediction using a comparative analysis of machine learning methods
    Samani, Saeideh
    Vadiati, Meysam
    Kisi, Ozgur
    Ghasemi, Leyla
    Farajzadeh, Reza
    EARTH SCIENCE INFORMATICS, 2024, : 4597 - 4618
  • [33] Disease Prediction using Machine Learning
    Dubey, Subham
    Banik, Sreerupa
    Ghosh, Deba
    Dey, Akash
    Das, Rishabh
    Dey, Ipsita
    Chowdhury, Sagarika
    Dey, Prianka
    2024 2nd World Conference on Communication and Computing, WCONF 2024, 2024,
  • [34] Diagnostic model based on bioinformatics and machine learning to distinguish Kawasaki disease using multiple datasets
    Mengyi Zhang
    Bocuo Ke
    Huichuan Zhuo
    Binhan Guo
    BMC Pediatrics, 22
  • [35] Diagnostic model based on bioinformatics and machine learning to distinguish Kawasaki disease using multiple datasets
    Zhang, Mengyi
    Ke, Bocuo
    Zhuo, Huichuan
    Guo, Binhan
    BMC PEDIATRICS, 2022, 22 (01)
  • [36] Comprehensive evaluation and performance analysis of machine learning in heart disease prediction
    Al-Alshaikh, Halah A.
    Prabu, P.
    Poonia, Ramesh Chandra
    Saudagar, Abdul Khader Jilani
    Yadav, Manoj
    AlSagri, Hatoon S.
    AlSanad, Abeer A.
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [37] Implanted Knee Kinematics Prediction: comparative performance analysis of machine learning techniques
    Hossain, Belayat
    Morooka, Takatoshi
    Okuno, Makiko
    Nii, Manabu
    Yoshiya, Shinichi
    Kobashi, Syoji
    2018 JOINT 7TH INTERNATIONAL CONFERENCE ON INFORMATICS, ELECTRONICS & VISION (ICIEV) AND 2018 2ND INTERNATIONAL CONFERENCE ON IMAGING, VISION & PATTERN RECOGNITION (ICIVPR), 2018, : 544 - 549
  • [38] RETRACTED: Comparative Analysis for Prediction of Kidney Disease Using Intelligent Machine Learning Methods (Retracted Article)
    Ifraz, Gazi Mohammed
    Rashid, Muhammad Hasnath
    Tazin, Tahia
    Bourouis, Sami
    Khan, Mohammad Monirujjaman
    COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE, 2021, 2021
  • [39] COMPARATIVE ANALYSIS OF THE CLASSIFICATION PERFORMANCE OF MACHINE LEARNING CLASSIFIERS AND DEEP NEURAL NETWORK CLASSIFIER FOR PREDICTION OF PARKINSON DISEASE
    Ul Haq, Amin
    Li, Jianping
    Memon, Muhammad Hammad
    Khan, Jalaluddin
    Din, Salah Ud
    Ahad, Ijaz
    Sun, Ruinan
    Lai, Zhilong
    2018 15TH INTERNATIONAL COMPUTER CONFERENCE ON WAVELET ACTIVE MEDIA TECHNOLOGY AND INFORMATION PROCESSING (ICCWAMTIP), 2018, : 101 - 106
  • [40] Prediction of breast cancer using machine learning algorithms on different datasets
    Yavuz, Omer Cagri
    Calp, M. Hanefi
    Erkengel, Hazel Ceren
    INGENIERIA SOLIDARIA, 2023, 19 (01):