Comparative analysis of machine learning and ensemble approaches for hepatitis B prediction using data mining with synthetic minority oversampling technique

被引:0
|
作者
Alizargar, Azadeh [1 ]
Chang, Yang-Lang [1 ]
Tan, Tan-Hsu [1 ]
Liu, Tsung-Yu [2 ]
机构
[1] Natl Taipei Univ Technol, Coll Elect Engn & Comp Sci, Dept Elect Engn, Taipei 10608, Taiwan
[2] Lunghwa Univ Sci & Technol, Dept Multimedia & Game Sci, Taoyuan 333326, Taiwan
关键词
Index terms- Hepatitis B; Liver damage; Early detection; Machine learning; Ensemble model; SMOTE; RISK; DIAGNOSIS; VIRUS;
D O I
10.1007/s12553-023-00802-x
中图分类号
R-058 [];
学科分类号
摘要
PurposeHepatitis B, caused by the Hepatitis B virus (HBV), can harm the liver without noticeable symptoms. Early detection is crucial to prevent transmission and enhance recovery. The main goal is to predict Hepatitis B through cost-effective lab test data, by utilizing machine learning. The primary focus is on evaluating the effectiveness of various algorithms in predicting the disease and their potential to enhance early diagnosis capabilities.MethodsSix distinct algorithms (Support Vector Machine, K-nearest Neighbors, Logistic Regression, decision tree, extreme gradient boosting, random forest) were employed alongside an ensemble model. Analysis involved two rounds: considering all features and key attributes. The Synthetic Minority Oversampling Technique (SMOTE) was employed for data imbalance. Various metrics, including the confusion matrix, precision, recall, F1 score, accuracy, receiver operating characteristics (ROC) curve, area under the curve (AUC), and mean absolute error (MAE), were utilized to assess the efficacy of each predictive technique. The National Health and Nutrition Examination Survey (NHANES) dataset was employed.ResultsThe experimental results demonstrate that the ensemble model attained the highest accuracy (97%) and AUC (0.997) in comparison to existing models. The analysis revealed that specific crucial features possess substantial predictive significance within this model.ConclusionThe study underscores the potential of the ensemble model as a valuable tool for medical practitioners, leveraging cost-effective and readily obtainable laboratory test data to predict Hepatitis B with remarkable accuracy. By facilitating early diagnosis and intervention, this research presents a promising avenue to enhance patient outcomes in the context of Hepatitis B.
引用
收藏
页码:109 / 118
页数:10
相关论文
共 50 条
  • [41] Improving the Prediction Accuracy of Data-Driven Fault Diagnosis for HVAC Systems by Applying the Synthetic Minority Oversampling Technique
    Shakerian, Shahrad
    Jebelli, Houtan
    Sitzabee, William E.
    COMPUTING IN CIVIL ENGINEERING 2021, 2022, : 90 - 97
  • [42] Comparative Characterization of Crofelemer Samples Using Data Mining and Machine Learning Approaches With Analytical Stability Data Sets
    Nariya, Maulik K.
    Kim, Jae Hyun
    Xiong, Jian
    Kleindl, Peter A.
    Hewarathna, Asha
    Fisher, Adam C.
    Joshi, Sangeeta B.
    Schoneich, Christian
    Forrest, M. Laird
    Middaugh, C. Russell
    Volkin, David B.
    Deeds, Eric J.
    JOURNAL OF PHARMACEUTICAL SCIENCES, 2017, 106 (11) : 3270 - 3279
  • [43] Cervical Cancer Identification with Synthetic Minority Oversampling Technique and PCA Analysis using Random Forest Classifier
    R. Geetha
    S. Sivasubramanian
    M. Kaliappan
    S. Vimal
    Suresh Annamalai
    Journal of Medical Systems, 2019, 43
  • [44] Cervical Cancer Identification with Synthetic Minority Oversampling Technique and PCA Analysis using Random Forest Classifier
    Geetha, R.
    Sivasubramanian, S.
    Kaliappan, M.
    Vimal, S.
    Annamalai, Suresh
    JOURNAL OF MEDICAL SYSTEMS, 2019, 43 (09)
  • [45] A Review: Machine Learning and Data Mining Approaches for Cardiovascular Disease Diagnosis and Prediction
    Rao G.S.
    Muneeswari G.
    EAI Endorsed Transactions on Pervasive Health and Technology, 2024, 10
  • [46] Crime Data Analysis and Prediction using Ensemble Learning
    Almaw, Ayisheshim
    Kadam, Kalyani
    PROCEEDINGS OF THE 2018 SECOND INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL SYSTEMS (ICICCS), 2018, : 1918 - 1923
  • [47] SPAW-SMOTE: Space Partitioning Adaptive Weighted Synthetic Minority Oversampling Technique For Imbalanced Data Set Learning
    Zhang, Qiang
    He, Junjiang
    Li, Tao
    Lan, Xiaolong
    Fang, Wenbo
    Li, Yihong
    COMPUTER JOURNAL, 2023, 67 (05): : 1747 - 1762
  • [48] Comparative Analysis of Machine Learning Approaches of Prediction of Diabetes Consequences in Pregnancy with Implications of Data Matrices
    Kumar, A. Aruna
    Henge, Santosh Kumar
    SOFT COMPUTING FOR SECURITY APPLICATIONS, ICSCS 2022, 2023, 1428 : 613 - 626
  • [49] Improving the performance of machine learning model predicting phase and crystal structure of high entropy alloys by the synthetic minority oversampling technique
    Hareharen, K.
    Panneerselvam, T.
    Mohan, R. Raj
    JOURNAL OF ALLOYS AND COMPOUNDS, 2024, 991
  • [50] Enterprise credit risk prediction using supply chain information: A decision tree ensemble model based on the differential sampling rate, Synthetic Minority Oversampling Technique and AdaBoost
    Yao, Gang
    Hu, Xiaojian
    Zhou, Taiyun
    Zhang, Yue
    EXPERT SYSTEMS, 2022, 39 (06)