Comparison of machine learning methods for estimating case fatality ratios: An Ebola outbreak simulation study

被引:1
|
作者
Forna, Alpha [1 ]
Dorigatti, Ilaria [2 ]
Nouvellet, Pierre [2 ,3 ]
Donnelly, Christl A. [2 ,4 ]
机构
[1] Simon Fraser Univ, Sch Comp Sci, Burnaby, BC, Canada
[2] Imperial Coll London, MRC, Dept Infect Dis Epidemiol, Ctr Global Infect Dis Anal, London, England
[3] Univ Sussex, Sch Life Sci, Brighton, E Sussex, England
[4] Univ Oxford, Dept Stat, Oxford, England
来源
PLOS ONE | 2021年 / 16卷 / 09期
基金
英国惠康基金; 英国医学研究理事会;
关键词
MISSING DATA; DISEASE;
D O I
10.1371/journal.pone.0257005
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background Machine learning (ML) algorithms are now increasingly used in infectious disease epidemiology. Epidemiologists should understand how ML algorithms behave within the context of outbreak data where missingness of data is almost ubiquitous. Methods Using simulated data, we use a ML algorithmic framework to evaluate data imputation performance and the resulting case fatality ratio (CFR) estimates, focusing on the scale and type of data missingness (i.e., missing completely at random-MCAR, missing at random-MAR, or missing not at random-MNAR). Results Across ML methods, dataset sizes and proportions of training data used, the area under the receiver operating characteristic curve decreased by 7% (median, range: 1%-16%) when missingness was increased from 10% to 40%. Overall reduction in CFR bias for MAR across methods, proportion of missingness, outbreak size and proportion of training data was 0.5% (median, range: 0%-11%). Conclusion ML methods could reduce bias and increase the precision in CFR estimates at low levels of missingness. However, no method is robust to high percentages of missingness. Thus, a datacentric approach is recommended in outbreak settings-patient survival outcome data should be prioritised for collection and random-sample follow-ups should be implemented to ascertain missing outcomes.
引用
收藏
页数:15
相关论文
共 50 条
  • [21] Predicting risk of sepsis, comparison between machine learning methods: a case study of a Virginia hospital
    Barghi, Behrad
    Azadeh-Fard, Nasibeh
    EUROPEAN JOURNAL OF MEDICAL RESEARCH, 2022, 27 (01)
  • [22] A Study on Several Machine Learning Methods for Estimating Cabin Occupant Equivalent Temperature
    Hintea, Diana
    Brusey, James
    Gaura, Elena
    ICIMCO 2015 PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON INFORMATICS IN CONTROL, AUTOMATION AND ROBOTICS, VOL. 1, 2015, : 629 - 634
  • [23] Estimating Evaluation of Cosmetics Reviews with Machine Learning Methods
    Ma, Qing
    Tsukagoshi, Miran
    Murata, Masaki
    2020 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2020), 2020, : 259 - 263
  • [24] A comparison of methods for estimating parameters of the stochastic Lomax process: through simulation study
    Nafidi, Ahmed
    Makroz, Ilyasse
    HACETTEPE JOURNAL OF MATHEMATICS AND STATISTICS, 2024, 53 (02): : 495 - 505
  • [25] Comparative Study for Daily Streamflow Simulation with Different Machine Learning Methods
    Hao, Ruonan
    Bai, Zhixu
    WATER, 2023, 15 (06)
  • [26] Machine Learning Methods for Sweet Spot Detection: A Case Study
    Hauge, Vera Louise
    Hermansen, Gudmund Horn
    GEOSTATISTICS VALENCIA 2016, 2017, 19 : 573 - 588
  • [27] A Case Study on Customer Segmentation by using Machine Learning Methods
    Ozan, Sukru
    2018 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND DATA PROCESSING (IDAP), 2018,
  • [28] Early transmission and case fatality of Ebola virus at the index site of the 2013-16 west African Ebola outbreak: a cross-sectional seroprevalence survey
    Timothy, Joseph W. S.
    Hall, Yper
    Akoi-Bore, Joseph
    Diallo, Boubacar
    Tipton, Thomas R. W.
    Bower, Hilary
    Strecker, Thomas
    Glynn, Judith R.
    Carroll, Miles W.
    LANCET INFECTIOUS DISEASES, 2019, 19 (04): : 429 - 438
  • [29] A Comparison Study of Machine Learning Enabled Filtering Methods for Battery Management
    Kohtz, Sara
    Wang, Pingfeng
    2020 IEEE INTERNATIONAL CONFERENCE ON PROGNOSTICS AND HEALTH MANAGEMENT (ICPHM), 2020,
  • [30] Deterministic Numeric Simulation and Surrogate Models with White and Black Machine Learning Methods: A Case Study on Inverse Mappings
    Valdes, Julio J.
    Tchagang, Alain B.
    2020 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2020, : 2495 - 2503