Early Thyroid Risk Prediction by Data Mining and Ensemble Classifiers

被引:6
|
作者
Alshayeji, Mohammad H. [1 ]
机构
[1] Kuwait Univ, Coll Engn & Petr, Dept Comp Engn, POB 5969, Safat 13060, Kuwait
来源
关键词
machine learning; thyroid; data mining; ensemble model; feature engineering; SMOTE;
D O I
10.3390/make5030061
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Thyroid disease is among the most prevalent endocrinopathies worldwide. As the thyroid gland controls human metabolism, thyroid illness is a matter of concern for human health. To save time and reduce error rates, an automatic, reliable, and accurate thyroid identification machine-learning (ML) system is essential. The proposed model aims to address existing work limitations such as the lack of detailed feature analysis, visualization, improvement in prediction accuracy, and reliability. Here, a public thyroid illness dataset containing 29 clinical features from the University of California, Irvine ML repository was used. The clinical features helped us to build an ML model that can predict thyroid illness by analyzing early symptoms and replacing the manual analysis of these attributes. Feature analysis and visualization facilitate an understanding of the role of features in thyroid prediction tasks. In addition, the overfitting problem was eliminated by 5-fold cross-validation and data balancing using the synthetic minority oversampling technique (SMOTE). Ensemble learning ensures prediction model reliability owing to the involvement of multiple classifiers in the prediction decisions. The proposed model achieved 99.5% accuracy, 99.39% sensitivity, and 99.59% specificity with the boosting method which is applicable to real-time computer-aided diagnosis (CAD) systems to ease diagnosis and promote early treatment.
引用
收藏
页码:1195 / 1213
页数:19
相关论文
共 50 条
  • [41] Incremental learning of ensemble classifiers on ECG data
    Macek, J
    [J]. 18TH IEEE SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS, PROCEEDINGS, 2005, : 315 - 320
  • [42] New hybrid data mining model for credit scoring based on feature selection algorithm and ensemble classifiers
    Nalic, Jasmina
    Martinovic, Goran
    Zagar, Drago
    [J]. ADVANCED ENGINEERING INFORMATICS, 2020, 45 (45)
  • [43] An Approach to Educational Data Mining Model Accuracy Improvement Using Histogram Discretization and Combining Classifiers into an Ensemble
    Dimic, Gabrijela
    Rancic, Dejan
    Pronie-Rancic, Olivera
    Milosevic, Danijela
    [J]. SMART EDUCATION AND E-LEARNING 2019, 2019, 144 : 267 - 280
  • [44] Ensemble of Classifiers for Length of Stay Prediction in Colorectal Cancer
    Stoean, Ruxandra
    Stoean, Catalin
    Sandita, Adrian
    Ciobanu, Daniela
    Mesina, Cristian
    [J]. ADVANCES IN COMPUTATIONAL INTELLIGENCE, PT I (IWANN 2015), 2015, 9094 : 444 - 457
  • [45] Prediction of cardiac arrest recurrence using ensemble classifiers
    Tapas, Nachiket
    Lone, Tushar
    Reddy, Damodar
    Kuppili, Venkatanaresh
    [J]. SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 2017, 42 (07): : 1135 - 1141
  • [46] Prediction of cardiac arrest recurrence using ensemble classifiers
    Nachiket Tapas
    Tushar Lone
    Damodar Reddy
    Venkatanaresh Kuppili
    [J]. Sādhanā, 2017, 42 : 1135 - 1141
  • [47] Effective Prediction of Type II Diabetes Mellitus Using Data Mining Classifiers and SMOTE
    Shuja, Mirza
    Mittal, Sonu
    Zaman, Majid
    [J]. ADVANCES IN COMPUTING AND INTELLIGENT SYSTEMS, ICACM 2019, 2020, : 195 - 211
  • [48] Early prediction of college attrition using data mining
    Martins, Luiz Carlos B.
    Carvalho, Rommel N.
    Carvalho, Ricardo S.
    Victorino, Marcio C.
    Holanda, Maristela
    [J]. 2017 16TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2017, : 1075 - 1078
  • [49] Prediction by Fuzzy Clustering and KNN on Validation Data With Parallel Ensemble of Interpretable TSK Fuzzy Classifiers
    Zhang, Xiongtao
    Nojima, Yusuke
    Ishibuchi, Hisao
    Hu, Wenjun
    Wang, Shitong
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2022, 52 (01): : 400 - 414
  • [50] Prediction of cardiovascular risk in hemodialysis patients by data mining
    Pfaff, M
    Weller, K
    Woetzel, D
    Guthke, R
    Schroeder, K
    Stein, G
    Pohlmeier, R
    Vienken, J
    [J]. METHODS OF INFORMATION IN MEDICINE, 2004, 43 (01) : 106 - 113