Hepatitis C Virus Detection Model by Using Random Forest, Logistic-Regression and ABC Algorithm

被引:10
|
作者
Li, Tzuu-Hseng S. [1 ]
Chiu, Huan-Jung [1 ]
Kuo, Ping-Huan [2 ]
机构
[1] Natl Cheng Kung Univ, Dept Elect Engn, aiRobots Lab, Tainan 70101, Taiwan
[2] Natl Chung Cheng Univ, Dept Mech Engn, Chiayi 62102, Taiwan
关键词
Liver diseases; Classification tree analysis; Random forests; Data models; Classification algorithms; Artificial bee colony algorithm; Medical diagnostic imaging; Monte Carlo methods; Sampling methods; Random forest; logistic regression; two-stage mixing; ABC algorithm; 10-fold Monte-Carlo cross-validation; synthetic minority oversampling technique; DISEASE DIAGNOSIS; LIVER-DISEASE; CLASSIFICATION; PREDICTION;
D O I
10.1109/ACCESS.2022.3202295
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This study proposes an automatic classifier for detecting the multiclass probabilities of hepatitis C virus (HCV) incidence based on patients' blood attributes. The purpose of this study is to establish an artificial intelligence-based model that can identify HCV patients and detect the disease in early stage for future treatments. This model can be applied by using clinical data and keeps the performance from imbalanced datasets. The innovation in this article lies in considering the "unbalanced data" existing in medical record-based clinical data. Synthetic minority oversampling technique (SMOTE) algorithm was further employed to derive corresponding solutions. This objective was achieved using a cascade two-stage method combining the random forest (RF) and logistic regression (LR) algorithms. Two models were trained by applying the RF (Model 1) and LR (Model 2) to raw and preprocessed data, respectively. The artificial bee colony (ABC) algorithm was then used to determine the optimal threshold value required for filtering and separation, that is, the optimal combination of both models. The two-stage mixing algorithm combines algorithms of different search dimensions, thus integrating the strengths of those algorithms. The critical threshold value for separating Model 1 and Model 2 was obtained through an optimized search using the ABC algorithm. After conducting 10-fold Monte Carlo cross-validation experiments 50 times (for mean values), data from the recent pandemic were used to verify the proposed method. To evaluate the quantitative results, indicators, such as prediction accuracy, precision, recall, F1-score, and Matthews correlation coefficient, were compared with those of the latest algorithms used in relevant fields. The results indicate that the proposed model, named Cascade RF-LR (with SMOTE), can be used to detect the multiclass probabilities of HCV incidence using the ABC algorithm, thereby improving the effectiveness of relevant treatments.
引用
收藏
页码:91045 / 91058
页数:14
相关论文
共 50 条
  • [41] An Incident Detection Model Using Random Forest Classifier
    Elsahly, Osama
    Abdelfatah, Akmal
    SMART CITIES, 2023, 6 (04): : 1786 - 1813
  • [42] Correcting Elevation Error of ASTER GDEM Using Random Forest Regression Algorithm
    Yu T.
    Dong Y.
    Wuhan Daxue Xuebao (Xinxi Kexue Ban)/Geomatics and Information Science of Wuhan University, 2021, 46 (07): : 1098 - 1105
  • [43] Weight Estimation of Mediterranean Food Images using Random Forest Regression Algorithm
    Konstantakopoulos, Fotios S.
    Georga, Eleni I.
    Tachos, Nikolaos S.
    Fotiadis, Dimitrios I.
    2023 45TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE & BIOLOGY SOCIETY, EMBC, 2023,
  • [44] GLOBAL ANALYSIS OF HEPATITIS B AND C VIRUS MODEL WITH LOGISTIC HEPATOCYTE GROWTH
    Yiqun LiJianquan LiHongmei Bi Deptof Applied Mathand PhysicsAir Force Engineering UniversityXian
    Annals of Differential Equations, 2012, 28 (03) : 306 - 312
  • [46] A prediction model for the floor impact sound using random forest regression
    HIRAKAWA, Susumu
    HIRAMITSU, Atsuo
    Journal of Environmental Engineering (Japan), 2021, 86 (779): : 25 - 33
  • [47] Global dynamics of an Hepatitis C Virus mathematical cellular model with a logistic term
    Nangue, Alexis
    Donfack, Thiery
    Yafago, David Avava Ndode
    EUROPEAN JOURNAL OF PURE AND APPLIED MATHEMATICS, 2019, 12 (03): : 944 - 959
  • [48] Comparison of Novel Optimized Random Forest Technique and Logistic Regression for Credit Card Fraud Detection with Improved Precision
    Baig, M. Shahid Saif Ali
    Jaisharma, K.
    JOURNAL OF PHARMACEUTICAL NEGATIVE RESULTS, 2022, 13 : 723 - 727
  • [49] Prediction of Permeability Using Random Forest and Genetic Algorithm Model
    Wang, Junhui
    Yan, Wanzi
    Wan, Zhijun
    Wang, Yi
    Lv, Jiakun
    Zhou, Aiping
    CMES-COMPUTER MODELING IN ENGINEERING & SCIENCES, 2020, 125 (03): : 1135 - 1157
  • [50] Demand analysis of flood insurance by using logistic regression model and genetic algorithm
    Sidi, P.
    Mamat, M. B.
    Sukono
    Supian, S.
    Putra, A. S.
    INDONESIAN OPERATIONS RESEARCH ASSOCIATION - INTERNATIONAL CONFERENCE ON OPERATIONS RESEARCH 2017, 2018, 332