Hepatitis C Virus Detection Model by Using Random Forest, Logistic-Regression and ABC Algorithm

被引:10
|
作者
Li, Tzuu-Hseng S. [1 ]
Chiu, Huan-Jung [1 ]
Kuo, Ping-Huan [2 ]
机构
[1] Natl Cheng Kung Univ, Dept Elect Engn, aiRobots Lab, Tainan 70101, Taiwan
[2] Natl Chung Cheng Univ, Dept Mech Engn, Chiayi 62102, Taiwan
关键词
Liver diseases; Classification tree analysis; Random forests; Data models; Classification algorithms; Artificial bee colony algorithm; Medical diagnostic imaging; Monte Carlo methods; Sampling methods; Random forest; logistic regression; two-stage mixing; ABC algorithm; 10-fold Monte-Carlo cross-validation; synthetic minority oversampling technique; DISEASE DIAGNOSIS; LIVER-DISEASE; CLASSIFICATION; PREDICTION;
D O I
10.1109/ACCESS.2022.3202295
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This study proposes an automatic classifier for detecting the multiclass probabilities of hepatitis C virus (HCV) incidence based on patients' blood attributes. The purpose of this study is to establish an artificial intelligence-based model that can identify HCV patients and detect the disease in early stage for future treatments. This model can be applied by using clinical data and keeps the performance from imbalanced datasets. The innovation in this article lies in considering the "unbalanced data" existing in medical record-based clinical data. Synthetic minority oversampling technique (SMOTE) algorithm was further employed to derive corresponding solutions. This objective was achieved using a cascade two-stage method combining the random forest (RF) and logistic regression (LR) algorithms. Two models were trained by applying the RF (Model 1) and LR (Model 2) to raw and preprocessed data, respectively. The artificial bee colony (ABC) algorithm was then used to determine the optimal threshold value required for filtering and separation, that is, the optimal combination of both models. The two-stage mixing algorithm combines algorithms of different search dimensions, thus integrating the strengths of those algorithms. The critical threshold value for separating Model 1 and Model 2 was obtained through an optimized search using the ABC algorithm. After conducting 10-fold Monte Carlo cross-validation experiments 50 times (for mean values), data from the recent pandemic were used to verify the proposed method. To evaluate the quantitative results, indicators, such as prediction accuracy, precision, recall, F1-score, and Matthews correlation coefficient, were compared with those of the latest algorithms used in relevant fields. The results indicate that the proposed model, named Cascade RF-LR (with SMOTE), can be used to detect the multiclass probabilities of HCV incidence using the ABC algorithm, thereby improving the effectiveness of relevant treatments.
引用
收藏
页码:91045 / 91058
页数:14
相关论文
共 50 条
  • [1] Simulation of Oil Spill Using Logistic-Regression CA Model
    Zhang, Yihan
    Qiao, Jigang
    Wu, Bingqi
    Jiang, Weiqi
    Xu, Xiaocong
    Hu, Guohua
    2015 23RD INTERNATIONAL CONFERENCE ON GEOINFORMATICS, 2015,
  • [2] PREDICTING POSTOPERATIVE NAUSEA AND VOMITING USING A LOGISTIC-REGRESSION MODEL
    SAMRA, G
    LITTLEJOHN, I
    BROOMHEAD, C
    TONER, C
    POWNEY, J
    PALAZZO, M
    EVANS, S
    STRUNIN, L
    BRITISH JOURNAL OF ANAESTHESIA, 1994, 72 (04) : P488 - P488
  • [3] Comparison of Heart Disease Classification with Logistic Regression Algorithm and Random Forest Algorithm
    Latifah, Firda Anindita
    Slamet, Isnandar
    Sugiyanto
    INTERNATIONAL CONFERENCE ON SCIENCE AND APPLIED SCIENCE (ICSAS2020), 2020, 2296
  • [4] SCREENING FOR SUBSTANCE ABUSE IN A LARGE WORKFORCE USING A LOGISTIC-REGRESSION MODEL
    SCHULTZ, L
    JOHNSON, CC
    SANDHU, J
    AUSTIN, R
    TILLEY, B
    WIENCEK, R
    AMERICAN JOURNAL OF EPIDEMIOLOGY, 1989, 130 (04) : 802 - 802
  • [5] A Block Cipher Algorithm Identification Scheme Based on Hybrid Random Forest and Logistic Regression Model
    Ke Yuan
    Yabing Huang
    Jiabao Li
    Chunfu Jia
    Daoming Yu
    Neural Processing Letters, 2023, 55 : 3185 - 3203
  • [6] A Block Cipher Algorithm Identification Scheme Based on Hybrid Random Forest and Logistic Regression Model
    Yuan, Ke
    Huang, Yabing
    Li, Jiabao
    Jia, Chunfu
    Yu, Daoming
    NEURAL PROCESSING LETTERS, 2023, 55 (03) : 3185 - 3203
  • [7] Random effects logistic regression model for anomaly detection
    Mok, Min Seok
    Sohn, So Young
    Ju, Yong Han
    EXPERT SYSTEMS WITH APPLICATIONS, 2010, 37 (10) : 7162 - 7166
  • [8] Heart Disease Prediction Based on Age Detection Using Logistic Regression over Random Forest
    Karthi, C. B. M.
    Kalaivani, A.
    CARDIOMETRY, 2022, (25): : 1731 - 1737
  • [9] Crime Prediction Model using Three Classification Techniques: Random Forest, Logistic Regression, and LightGBM
    Alsubayhin, Abdulrahman
    Ramzan, Muhammad Sher
    Alzahrani, Bander
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (01) : 240 - 251
  • [10] INFERRING PREFERENCES IN MULTIPLE CRITERIA DECISION-ANALYSIS USING A LOGISTIC-REGRESSION MODEL
    STEWART, TJ
    MANAGEMENT SCIENCE, 1984, 30 (09) : 1067 - 1077