Hepatitis C Virus Detection Model by Using Random Forest, Logistic-Regression and ABC Algorithm

被引:10
|
作者
Li, Tzuu-Hseng S. [1 ]
Chiu, Huan-Jung [1 ]
Kuo, Ping-Huan [2 ]
机构
[1] Natl Cheng Kung Univ, Dept Elect Engn, aiRobots Lab, Tainan 70101, Taiwan
[2] Natl Chung Cheng Univ, Dept Mech Engn, Chiayi 62102, Taiwan
关键词
Liver diseases; Classification tree analysis; Random forests; Data models; Classification algorithms; Artificial bee colony algorithm; Medical diagnostic imaging; Monte Carlo methods; Sampling methods; Random forest; logistic regression; two-stage mixing; ABC algorithm; 10-fold Monte-Carlo cross-validation; synthetic minority oversampling technique; DISEASE DIAGNOSIS; LIVER-DISEASE; CLASSIFICATION; PREDICTION;
D O I
10.1109/ACCESS.2022.3202295
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This study proposes an automatic classifier for detecting the multiclass probabilities of hepatitis C virus (HCV) incidence based on patients' blood attributes. The purpose of this study is to establish an artificial intelligence-based model that can identify HCV patients and detect the disease in early stage for future treatments. This model can be applied by using clinical data and keeps the performance from imbalanced datasets. The innovation in this article lies in considering the "unbalanced data" existing in medical record-based clinical data. Synthetic minority oversampling technique (SMOTE) algorithm was further employed to derive corresponding solutions. This objective was achieved using a cascade two-stage method combining the random forest (RF) and logistic regression (LR) algorithms. Two models were trained by applying the RF (Model 1) and LR (Model 2) to raw and preprocessed data, respectively. The artificial bee colony (ABC) algorithm was then used to determine the optimal threshold value required for filtering and separation, that is, the optimal combination of both models. The two-stage mixing algorithm combines algorithms of different search dimensions, thus integrating the strengths of those algorithms. The critical threshold value for separating Model 1 and Model 2 was obtained through an optimized search using the ABC algorithm. After conducting 10-fold Monte Carlo cross-validation experiments 50 times (for mean values), data from the recent pandemic were used to verify the proposed method. To evaluate the quantitative results, indicators, such as prediction accuracy, precision, recall, F1-score, and Matthews correlation coefficient, were compared with those of the latest algorithms used in relevant fields. The results indicate that the proposed model, named Cascade RF-LR (with SMOTE), can be used to detect the multiclass probabilities of HCV incidence using the ABC algorithm, thereby improving the effectiveness of relevant treatments.
引用
收藏
页码:91045 / 91058
页数:14
相关论文
共 50 条
  • [31] Using the Analysis of Logistic Regression Model in Auditing and Detection of Frauds
    Boztepe, Engin
    Usul, Hayrettin
    KHAZAR JOURNAL OF HUMANITIES AND SOCIAL SCIENCES, 2019, 22 (03): : 5 - 23
  • [32] DOWNSCALING LAND SURFACE TEMPERATURE BY USING RANDOM FOREST REGRESSION ALGORITHM
    Li, Wan
    Ni, Li
    Li, Zhao-Liang
    Wu, Hua
    IGARSS 2018 - 2018 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2018, : 2527 - 2530
  • [33] Comparative Analysis of Gaussian Mixture Model, Logistic Regression and Random Forest for Big Data Classification using Map Reduce
    Singh, Vikas
    Gupta, Rahul. K.
    Sevakula, Rahul K.
    Verma, Nishchal K.
    2016 11TH INTERNATIONAL CONFERENCE ON INDUSTRIAL AND INFORMATION SYSTEMS (ICIIS), 2016, : 333 - 338
  • [34] Empirical Analysis of Financial Statement Fraud of Listed Companies Based on Logistic Regression and Random Forest Algorithm
    Liu, Xinchun
    JOURNAL OF MATHEMATICS, 2021, 2021
  • [35] Web URLs Phishing Detection Model with Random Forest Algorithm
    Putri, Aulia Kharisma
    Wiratama, Jansen
    Sanjaya, Samuel Ady
    Wijaya, Santo Fernandi
    Johan, Monika Evelin
    Faza, Ahmad
    2024 5TH INTERNATIONAL CONFERENCE ON BIG DATA ANALYTICS AND PRACTICES, IBDAP, 2024, : 1 - 5
  • [36] Design of Random Forest Algorithm Based Model for Tachycardia Detection
    Mohapatra, Saumendra Kumar
    Swarnkar, Tripti
    Mohanty, Mihir Narayan
    ADVANCED COMPUTING AND INTELLIGENT ENGINEERING, 2020, 1082 : 191 - 199
  • [37] Risk prediction model for early postoperative death in patients with hepatocellular carcinoma: a retrospective study based on random forest algorithm and logistic regression
    Gao, Yang
    Wu, Fu-gui
    Guo, Wen-bo
    Zheng, Hao
    Zhang, Lu
    Chen, Xiu-li
    Li, Man
    EUROPEAN JOURNAL OF GASTROENTEROLOGY & HEPATOLOGY, 2022, 34 (12) : 1247 - 1254
  • [38] Forest cover dynamics analysis and prediction modeling using logistic regression model
    Kumar, Rakesh
    Nandy, S.
    Agarwal, Reshu
    Kushwaha, S. P. S.
    ECOLOGICAL INDICATORS, 2014, 45 : 444 - 455
  • [39] Modeling Anthropogenic Fire Occurrence in the Boreal Forest of China Using Logistic Regression and Random Forests
    Guo, Futao
    Zhang, Lianjun
    Jin, Sen
    Tigabu, Mulualem
    Su, Zhangwen
    Wang, Wenhui
    FORESTS, 2016, 7 (11):
  • [40] An Effective Algorithm for Intrusion Detection Using Random Shapelet Forest
    Li, Gongliang
    Yin, Mingyong
    Jing, Siyuan
    Guo, Bing
    WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2021, 2021 (2021):