A cluster-based ensemble approach for congenital heart disease prediction

被引:6
|
作者
Kaur, Ishleen [1 ]
Ahmad, Tanvir [2 ]
机构
[1] Univ Delhi, Sri Guru Tegh Bahadur Khalsa Coll, Delhi, India
[2] Jamia Millia Islamia, Dept Comp Engn, New Delhi, India
关键词
Congenital heart disease; DBSCAN; Ensemble; Machine learning; Random forest; DIAGNOSIS; DEFECTS; TRENDS;
D O I
10.1016/j.cmpb.2023.107922
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Background: One of the most prevalent birth disorders is congenital heart diseases (CHD). Although CHD risk factors have been the subject of numerous studies, their propensity to cause CHD has not been tested. Particularly few research has attempted to forecast CHD risk using population-based cross-sectional data, which is inherently imbalanced. Objective: The main goals of this study are to create a reliable data analysis model that can help with (i) a better understanding of congenital heart disease prediction in the presence of missing and unbalanced data and (ii) creating cohorts of expectant mothers with similar lifestyle characteristics. Methods: Clusters of patient cohorts are produced using the unsupervised data mining technique density-based spatial clustering of applications with noise (DBSCAN). For more accurate CHD prediction, a random forest model was trained using these clusters and their corresponding patterns. This study uses a dataset of 33,831 expectant mothers to make its prediction. Missing data were handled using the k-NN imputation approach, while extremely unbalanced data were balanced using SMOTE. These techniques are all data-driven and need little to no user or expert involvement. Results and Conclusion: Using DBSCAN, three cohorts were found. The cluster information enhanced the random forest-based CHD prediction and revealed intricate factors that influence prediction accuracy. The proposed approach gave the highest results with 99 % accuracy and 0.91 AUC and performed better than the state-of-theart methodologies. Hence, the suggested method using unsupervised learning can provide intricate information to the classifier and further enhance the performance of the classification.
引用
收藏
页数:7
相关论文
共 50 条
  • [21] A Novel Ensemble Approach with HGBDTRF for Enhanced Detection and Prediction of Heart Disease
    Ramesh, V.
    Das, M. Swamy
    JOURNAL OF ELECTRICAL SYSTEMS, 2024, 20 (06) : 1089 - 1103
  • [22] An automatic heart disease prediction using cluster-based bi-directional LSTM (C-BiLSTM) algorithm
    Dileep, P.
    Rao, Kunjam Nageswara
    Bodapati, Prajna
    Gokuruboyina, Sitaratnam
    Peddi, Revathy
    Grover, Amit
    Sheetal, Anu
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (10): : 7253 - 7266
  • [23] Multiple strong and balanced cluster-based ensemble of deep learners
    Jan, Zohaib
    Verma, Brijesh
    PATTERN RECOGNITION, 2020, 107
  • [24] Kernel cluster-based ensemble SVM approaches for unbalanced data
    Tao, X. (taoxinmin@hrbeu.edu.cn), 2013, Editorial Board of Journal of Harbin Engineering (34):
  • [25] Cluster-Based Ensemble Learning Model for Aortic Dissection Screening
    Gao, Yan
    Wang, Min
    Zhang, Guogang
    Zhou, Lingjun
    Luo, Jingming
    Liu, Lijue
    INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH, 2022, 19 (09)
  • [26] A cluster-based intelligence ensemble learning method for classification problems
    Cui, Shaoze
    Wang, Yanzhang
    Yin, Yunqiang
    Cheng, T. C. E.
    Wang, Dujuan
    Zhai, Mingyu
    INFORMATION SCIENCES, 2021, 560 : 386 - 409
  • [27] Ensemble Methods for Heart Disease Prediction
    Talha Karadeniz
    Gül Tokdemir
    Hadi Hakan Maraş
    New Generation Computing, 2021, 39 : 569 - 581
  • [28] Ensemble Methods for Heart Disease Prediction
    Karadeniz, Talha
    Tokdemir, Gul
    Maras, Hadi Hakan
    NEW GENERATION COMPUTING, 2021, 39 (3-4) : 569 - 581
  • [29] Evolutionary Cluster-Based Synthetic Oversampling Ensemble (ECO-Ensemble) for Imbalance Learning
    Lim, Pin
    Goh, Chi Keong
    Tan, Kay Chen
    IEEE TRANSACTIONS ON CYBERNETICS, 2017, 47 (09) : 2850 - 2861
  • [30] A cluster-based approach to compression of Quality Scores
    Hernaez, Mikel
    Ochoa, Idoia
    Weissman, Tsachy
    2016 DATA COMPRESSION CONFERENCE (DCC), 2016, : 261 - 270