Unsupervised machine learning for disease prediction: a comparative performance analysis using multiple datasets

被引:2
|
作者
Lu, Haohui [1 ]
Uddin, Shahadat [1 ]
机构
[1] Univ Sydney, Fac Engn, Sch Project Management, Level 2,21 Ross St, Forest Lodge, NSW 2037, Australia
关键词
Disease prediction; Performance comparison; Unsupervised machine learning; Healthcare dataset; DIAGNOSIS;
D O I
10.1007/s12553-023-00805-8
中图分类号
R-058 [];
学科分类号
摘要
PurposeDisease risk prediction poses a significant and growing challenge in the medical field. While researchers have increasingly utilised machine learning (ML) algorithms to tackle this issue, supervised ML methods remain dominant. However, there is a rising interest in unsupervised techniques, especially in situations where data labels might be missing - as seen with undiagnosed or rare diseases. This study delves into comparing unsupervised ML models for disease prediction.MethodsThis study evaluated the efficacy of seven unsupervised algorithms on 15 datasets, including those of heart failure, diabetes, and breast cancer. It used six performance metrics for this comparison. They are Adjusted Rand Index, Adjusted Mutual Information, Homogeneity, Completeness, V-measure and Silhouette Coefficient.ResultsAmong the seven unsupervised ML methods, the DBSCAN (Density-based spatial clustering of applications with noise) showed the best performance most times (31), followed by the Bayesian Gaussian Mixture (18) and Divisive clustering (15). No single model consistently outshined others across every dataset and metric. The study emphasises the crucial role of model and performance measure selections based on application-specific needs. For example, DBSCAN excels in Homogeneity, Completeness and V-measure metrics. Conversely, the Bayesian Gaussian Mixture is good in the Adjusted R and Index metric. The codes used in this study can be found at https://github.com/haohuilu/unsupervisedml/.ConclusionThis research contributes deeper insights into the unsupervised ML applications in healthcare and encourages further investigations into model selection. Subsequent studies could harness genuine disease records for a more nuanced comparison and evaluation of models.
引用
收藏
页码:141 / 154
页数:14
相关论文
共 50 条
  • [1] Unsupervised machine learning for disease prediction: a comparative performance analysis using multiple datasets
    Haohui Lu
    Shahadat Uddin
    Health and Technology, 2024, 14 : 141 - 154
  • [2] A comparative evaluation of machine learning ensemble approaches for disease prediction using multiple datasets
    Mahajan, Palak
    Uddin, Shahadat
    Hajati, Farshid
    Moni, Mohammad Ali
    Gide, Ergun
    HEALTH AND TECHNOLOGY, 2024, 14 (03) : 597 - 613
  • [3] A comparative evaluation of machine learning ensemble approaches for disease prediction using multiple datasets
    Palak Mahajan
    Shahadat Uddin
    Farshid Hajati
    Mohammad Ali Moni
    Ergun Gide
    Health and Technology, 2024, 14 : 597 - 613
  • [4] Enhancing the Performance of Unsupervised Machine Learning using Parallel Computing: A Comparative Analysis
    Baligodugula, Vishnu Vardhan
    Amsaad, Fathi
    2024 IEEE 3RD INTERNATIONAL CONFERENCE ON COMPUTING AND MACHINE INTELLIGENCE, ICMI 2024, 2024,
  • [5] A Comparative Analysis of Unsupervised Machine Techniques for Liver Disease Prediction
    Vats, Varun
    Zhang, Lining
    Chatterjee, Sreejit
    Ahmed, Sabbir
    Enziama, Elvin
    Tepe, Kemal
    2018 IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (ISSPIT), 2018, : 303 - 307
  • [6] A Comparative Analysis of Unsupervised Machine Techniques for Liver Disease Prediction
    Vats, Varun
    Zhang, Lining
    Chatterjee, Sreejit
    Ahmed, Sabbir
    Enziama, Elvin
    Tepe, Kemal
    2018 IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (ISSPIT), 2018, : 281 - 285
  • [7] A Comparative Analysis of Unsupervised Machine Techniques for Liver Disease Prediction
    Vats, Varun
    Zhang, Lining
    Chatterjee, Sreejit
    Ahmed, Sabbir
    Enziama, Elvin
    Tepe, Kemal
    2018 IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (ISSPIT), 2018, : 486 - 489
  • [8] Performance Analysis of Machine Learning Techniques on Software Defect Prediction using NASA Datasets
    Iqbal, Ahmed
    Aftab, Shabib
    Ali, Umair
    Nawaz, Zahid
    Sana, Laraib
    Ahmad, Munir
    Husen, Arif
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2019, 10 (05) : 300 - 308
  • [9] An Empirical Comparative Analysis Using Machine Learning Techniques for Liver Disease Prediction
    Alghobiri, Mohammed
    Khan, Hikmat Ullah
    Mahmood, Ahsan
    INTERNATIONAL JOURNAL OF HEALTHCARE INFORMATION SYSTEMS AND INFORMATICS, 2021, 16 (04)
  • [10] Multiple disease prediction using Machine learning algorithms
    Arumugam K.
    Naved M.
    Shinde P.P.
    Leiva-Chauca O.
    Huaman-Osorio A.
    Gonzales-Yanac T.
    Materials Today: Proceedings, 2023, 80 : 3682 - 3685