Building test data from real outbreaks for evaluating detection algorithms

被引:3
|
作者
Texier, Gaetan [1 ,2 ]
Jackson, Michael L. [3 ]
Siwe, Leonel [4 ]
Meynard, Jean-Baptiste [5 ]
Deparis, Xavier [5 ]
Chaudet, Herve [2 ]
机构
[1] Pasteur Ctr Cameroun, Yaounde, Cameroon
[2] Aix Marseille Univ, INSERM, IRD, UMR 912,SESSTIM,Fac Med, 27,Bd Jean Moulin, Marseille, France
[3] Grp Hlth Res Inst, Seattle, WA USA
[4] ISSEA, Yaounde, Cameroon
[5] French Armed Forces Ctr Epidemiol & Publ Hlth CES, Marseille, France
来源
PLOS ONE | 2017年 / 12卷 / 09期
关键词
INCUBATION PERIOD; INFECTIOUS-DISEASES; MARKOV-CHAINS; SURVEILLANCE; INFORMATION; DIVERGENCE; MODEL;
D O I
10.1371/journal.pone.0183992
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Benchmarking surveillance systems requires realistic simulations of disease outbreaks. However, obtaining these data in sufficient quantity, with a realistic shape and covering a sufficient range of agents, size and duration, is known to be very difficult. The dataset of outbreak signals generated should reflect the likely distribution of authentic situations faced by the surveillance system, including very unlikely outbreak signals. We propose and evaluate a new approach based on the use of historical outbreak data to simulate tailored outbreak signals. The method relies on a homothetic transformation of the historical distribution followed by resampling processes (Binomial, Inverse Transform Sampling Method-ITSM, Metropolis-Hasting Random Walk, Metropolis-Hasting Independent, Gibbs Sampler, Hybrid Gibbs Sampler). We carried out an analysis to identify the most important input parameters for simulation quality and to evaluate performance for each of the resampling algorithms. Our analysis confirms the influence of the type of algorithm used and simulation parameters (i.e. days, number of cases, outbreak shape, overall scale factor) on the results. We show that, regardless of the outbreaks, algorithms and metrics chosen for the evaluation, simulation quality decreased with the increase in the number of days simulated and increased with the number of cases simulated. Simulating outbreaks with fewer cases than days of duration (i.e. overall scale factor less than 1) resulted in an important loss of information during the simulation. We found that Gibbs sampling with a shrinkage procedure provides a good balance between accuracy and data dependency. If dependency is of little importance, binomial and ITSM methods are accurate. Given the constraint of keeping the simulation within a range of plausible epidemiological curves faced by the surveillance system, our study confirms that our approach can be used to generate a large spectrum of outbreak signals.
引用
收藏
页数:17
相关论文
共 50 条
  • [21] Real options approach to evaluating genetic algorithms
    Rimcharoen, Sunisa
    Sutivong, Daricha
    Chongstitvatana, Prabhas
    APPLIED SOFT COMPUTING, 2009, 9 (03) : 896 - 905
  • [22] Evaluating data-compression algorithms
    Mathews, GJ
    DR DOBBS JOURNAL, 1996, 21 (01): : 50 - 53
  • [23] Evaluating a novel predictive tool for respiratory infection outbreaks in real life
    Rousogianni, Eleni
    Perlepe, Garifallia eirini
    Meletis, Eleftherios
    Rouka, Erasmia
    Boutlas, Stylianos
    Poulakida, Irene
    Rapti, Georgia
    Gouta, Evdoxia
    Mpaltopoulou, Eleni
    Mpaltopoulos, Giorgos
    Risdiyanto, Rubee
    Papagiannis, Dimitrios
    Kostoulas, Polychronis
    Gourgoulianis, Konstantinos, I
    EUROPEAN RESPIRATORY JOURNAL, 2024, 64
  • [24] Building Test Collections for Evaluating Temporal IR
    Joho, Hideo
    Jatowt, Adam
    Blanco, Roi
    Yu, Haitao
    Yamamoto, Shuhei
    SIGIR'16: PROCEEDINGS OF THE 39TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2016, : 677 - 680
  • [25] Evaluating Machine Learning Algorithms for Detection of Interest Flooding Attack in Named Data Networking
    Kumar, Naveen
    Singh, Ashutosh Kumar
    Srivastava, Shashank
    SIN'17: PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE ON SECURITY OF INFORMATION AND NETWORKS, 2017, : 299 - 302
  • [26] BUILDING EDGE DETECTION FROM SAR COMPLEX DATA
    Baselice, Fabio
    Ferraioli, Giampaolo
    Grassia, Alessandro
    Pascazio, Vito
    2011 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS), 2011, : 328 - 331
  • [27] DEM generation and building detection from Lidar data
    Ma, RJ
    PHOTOGRAMMETRIC ENGINEERING AND REMOTE SENSING, 2005, 71 (07): : 847 - 854
  • [28] Pathogen-driven outbreaks in forest defoliators revisited: Building models from experimental data
    Dwyer, G
    Dushoff, J
    Elkinton, JS
    Levin, SA
    AMERICAN NATURALIST, 2000, 156 (02): : 105 - 120
  • [29] Bridging the gap between real-life data and simulated data by providing a highly realistic fall dataset for evaluating camera-based fall detection algorithms
    Baldewijns, Greet
    Debard, Glen
    Mertes, Gert
    Vanrumste, Bart
    Croonenborghs, Tom
    HEALTHCARE TECHNOLOGY LETTERS, 2016, 3 (01) : 6 - 11
  • [30] Computer Algorithms for Evaluating the Quality of ECGs in Real Time
    Xia, Henian
    Garcia, Gabriel A.
    McBride, Joseph C.
    Sullivan, Adam
    De Bock, Thibaut
    Bains, Jujhar
    Wortham, Dale C.
    Zhao, Xiaopeng
    2011 COMPUTING IN CARDIOLOGY, 2011, 38 : 369 - 372