Prediction of hepatitis E using machine learning models

被引:22
|
作者
Guo, Yanhui [1 ]
Feng, Yi [2 ,3 ]
Qu, Fuli [1 ]
Zhang, Li [2 ,3 ]
Yan, Bingyu [2 ,3 ]
Lv, Jingjing [2 ,3 ]
机构
[1] Shandong Womens Univ, Sch Data & Comp Sci, Jinan, Shandong, Peoples R China
[2] Shandong Ctr Dis Control & Prevent, Shandong Prov Key Lab Infect Dis Control & Preven, Jinan, Shandong, Peoples R China
[3] Shandong Univ, Acad Prevent Med, Jinan, Shandong, Peoples R China
来源
PLOS ONE | 2020年 / 15卷 / 09期
关键词
D O I
10.1371/journal.pone.0237750
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background Accurate and reliable predictions of infectious disease can be valuable to public health organizations that plan interventions to decrease or prevent disease transmission. A great variety of models have been developed for this task. However, for different data series, the performance of these models varies. Hepatitis E, as an acute liver disease, has been a major public health problem. Which model is more appropriate for predicting the incidence of hepatitis E? In this paper, three different methods are used and the performance of the three methods is compared. Methods Autoregressive integrated moving average(ARIMA), support vector machine(SVM) and long short-term memory(LSTM) recurrent neural network were adopted and compared. ARIMA was implemented by python with the help of statsmodels. SVM was accomplished by matlab with libSVM library. LSTM was designed by ourselves with Keras, a deep learning library. To tackle the problem of overfitting caused by limited training samples, we adopted dropout and regularization strategies in our LSTM model. Experimental data were obtained from the monthly incidence and cases number of hepatitis E from January 2005 to December 2017 in Shandong province, China. We selected data from July 2015 to December 2017 to validate the models, and the rest was taken as training set. Three metrics were applied to compare the performance of models, including root mean square error(RMSE), mean absolute percentage error(MAPE) and mean absolute error(MAE). Results By analyzing data, we tookARIMA(1, 1, 1),ARIMA(3, 1, 2) as monthly incidence prediction model and cases number prediction model, respectively. Cross-validation and grid search were used to optimize parameters of SVM. Penalty coefficientCand kernel function parametergwere set 8, 0.125 for incidence prediction, and 22, 0.01 for cases number prediction. LSTM has 4 nodes. Dropout and L2 regularization parameters were set 0.15, 0.001, respectively. By the metrics of RMSE, we obtained 0.022, 0.0204, 0.01 for incidence prediction, using ARIMA, SVM and LSTM. And we obtained 22.25, 20.0368, 11.75 for cases number prediction, using three models. For MAPE metrics, the results were 23.5%, 21.7%, 15.08%, and 23.6%, 21.44%, 13.6%, for incidence prediction and cases number prediction, respectively. For MAE metrics, the results were 0.018, 0.0167, 0.011 and 18.003, 16.5815, 9.984, for incidence prediction and cases number prediction, respectively. Conclusions Comparing ARIMA, SVM and LSTM, we found that nonlinear models(SVM, LSTM) outperform linear models(ARIMA). LSTM obtained the best performance in all three metrics of RSME, MAPE, MAE. Hence, LSTM is the most suitable for predicting hepatitis E monthly incidence and cases number.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Prediction for Hepatitis E Incidence Using Support Vector Machine
    Feng, Yi
    Guo, Yanhui
    Lv, Jingjing
    Yan, Bingyu
    Xu, Aiqiang
    Zhang, Li
    [J]. JOURNAL OF MEDICAL IMAGING AND HEALTH INFORMATICS, 2020, 10 (12) : 2863 - 2868
  • [2] Breast Cancer Prediction using Machine Learning Models
    Iparraguirre-Villanueva, Orlando
    Epifania-Huerta, Andres
    Torres-Ceclen, Carmen
    Ruiz-Alvarado, John
    Cabanillas-Carbonell, Michael
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (02) : 610 - 620
  • [3] Cardiovascular Disease Prediction Using Machine Learning Models
    Nikam, Atharv
    Bhandari, Sanket
    Mhaske, Aditya
    Mantri, Shamla
    [J]. 2020 IEEE PUNE SECTION INTERNATIONAL CONFERENCE (PUNECON), 2020, : 22 - 27
  • [4] Cocrystal Prediction Using Machine Learning Models and Descriptors
    Mswahili, Medard Edmund
    Lee, Min-Jeong
    Martin, Gati Lother
    Kim, Junghyun
    Kim, Paul
    Choi, Guang J.
    Jeong, Young-Seob
    [J]. APPLIED SCIENCES-BASEL, 2021, 11 (03): : 1 - 12
  • [5] Prediction of Frailty Grade Using Machine Learning Models
    Erdas, Cagatay Berke
    Olcer, Didem
    [J]. 2022 MEDICAL TECHNOLOGIES CONGRESS (TIPTEKNO'22), 2022,
  • [6] Bug Prediction of SystemC Models Using Machine Learning
    Efendioglu, Mustafa
    Sen, Alper
    Koroglu, Yavuz
    [J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2019, 38 (03) : 419 - 429
  • [7] Dangerous prediction in roads by using machine learning models
    Satla, Shiva Prasad
    Sadanandam, Manchala
    Suvarna, Buradagunta
    [J]. Ingenierie des Systemes d'Information, 2020, 25 (05): : 637 - 644
  • [8] Prediction of Preeclampsia Using Machine Learning and Deep Learning Models: A Review
    Aljameel, Sumayh S.
    Alzahrani, Manar
    Almusharraf, Reem
    Altukhais, Majd
    Alshaia, Sadeem
    Sahlouli, Hanan
    Aslam, Nida
    Khan, Irfan Ullah
    Alabbad, Dina A.
    Alsumayt, Albandari
    [J]. BIG DATA AND COGNITIVE COMPUTING, 2023, 7 (01)
  • [9] Prediction of the severity of acute pancreatitis using machine learning models
    Zhou, You
    Han, Fei
    Shi, Xiao-Lei
    Zhang, Jun-Xian
    Li, Guang-Yao
    Yuan, Chen-Chen
    Lu, Guo-Tao
    Hu, Liang-Hao
    Pan, Jia-Jia
    Xiao, Wei-Ming
    Yao, Guang-Huai
    [J]. POSTGRADUATE MEDICINE, 2022, 134 (07) : 703 - 710
  • [10] Flood Prediction Using Machine Learning Models: Literature Review
    Mosavi, Amir
    Ozturk, Pinar
    Chau, Kwok-wing
    [J]. WATER, 2018, 10 (11)