Building test data from real outbreaks for evaluating detection algorithms

被引:3
|
作者
Texier, Gaetan [1 ,2 ]
Jackson, Michael L. [3 ]
Siwe, Leonel [4 ]
Meynard, Jean-Baptiste [5 ]
Deparis, Xavier [5 ]
Chaudet, Herve [2 ]
机构
[1] Pasteur Ctr Cameroun, Yaounde, Cameroon
[2] Aix Marseille Univ, INSERM, IRD, UMR 912,SESSTIM,Fac Med, 27,Bd Jean Moulin, Marseille, France
[3] Grp Hlth Res Inst, Seattle, WA USA
[4] ISSEA, Yaounde, Cameroon
[5] French Armed Forces Ctr Epidemiol & Publ Hlth CES, Marseille, France
来源
PLOS ONE | 2017年 / 12卷 / 09期
关键词
INCUBATION PERIOD; INFECTIOUS-DISEASES; MARKOV-CHAINS; SURVEILLANCE; INFORMATION; DIVERGENCE; MODEL;
D O I
10.1371/journal.pone.0183992
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Benchmarking surveillance systems requires realistic simulations of disease outbreaks. However, obtaining these data in sufficient quantity, with a realistic shape and covering a sufficient range of agents, size and duration, is known to be very difficult. The dataset of outbreak signals generated should reflect the likely distribution of authentic situations faced by the surveillance system, including very unlikely outbreak signals. We propose and evaluate a new approach based on the use of historical outbreak data to simulate tailored outbreak signals. The method relies on a homothetic transformation of the historical distribution followed by resampling processes (Binomial, Inverse Transform Sampling Method-ITSM, Metropolis-Hasting Random Walk, Metropolis-Hasting Independent, Gibbs Sampler, Hybrid Gibbs Sampler). We carried out an analysis to identify the most important input parameters for simulation quality and to evaluate performance for each of the resampling algorithms. Our analysis confirms the influence of the type of algorithm used and simulation parameters (i.e. days, number of cases, outbreak shape, overall scale factor) on the results. We show that, regardless of the outbreaks, algorithms and metrics chosen for the evaluation, simulation quality decreased with the increase in the number of days simulated and increased with the number of cases simulated. Simulating outbreaks with fewer cases than days of duration (i.e. overall scale factor less than 1) resulted in an important loss of information during the simulation. We found that Gibbs sampling with a shrinkage procedure provides a good balance between accuracy and data dependency. If dependency is of little importance, binomial and ITSM methods are accurate. Given the constraint of keeping the simulation within a range of plausible epidemiological curves faced by the surveillance system, our study confirms that our approach can be used to generate a large spectrum of outbreak signals.
引用
收藏
页数:17
相关论文
共 50 条
  • [41] Evaluating acoustic speaker normalization algorithms: Evidence from longitudinal child data
    Kohn, Mary Elizabeth
    Farrington, Charlie
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2012, 131 (03): : 2237 - 2248
  • [42] Automatic Detection Algorithms for Oil Spill from Multisar Data
    Marghany, Maged
    Hashim, Mazlan
    PROCEEDINGS OF PROGRESS IN ELECTROMAGNETICS RESEARCH SYMPOSIUM (PIERS 2012), 2012, : 1796 - 1800
  • [43] Genetic and Swarm Algorithms for Optimizing the Control of Building HVAC Systems Using Real Data: A Comparative Study
    Garces-Jimenez, Alberto
    Gomez-Pulido, Jose-Manuel
    Gallego-Salvador, Nuria
    Garcia-Tejedor, Alvaro-Jose
    MATHEMATICS, 2021, 9 (18)
  • [44] Generating test data for both path coverage and fault detection using genetic algorithms
    Dunwei Gong
    Yan Zhang
    Frontiers of Computer Science, 2013, 7 : 822 - 837
  • [45] Early Detection of Seasonal Outbreaks from Twitter Data Using Machine Learning Approaches
    Amin, Samina
    Uddin, Muhammad Irfan
    AlSaeed, Duaa H.
    Khan, Atif
    Adnan, Muhammad
    COMPLEXITY, 2021, 2021
  • [46] Generating Test Data for Both Paths Coverage and Faults Detection Using Genetic Algorithms
    Gong, Dun-wei
    Zhang, Yan
    ADVANCED INTELLIGENT COMPUTING THEORIES AND APPLICATIONS: WITH ASPECTS OF ARTIFICIAL INTELLIGENCE, 2012, 6839 : 664 - 671
  • [47] Generating test data for both path coverage and fault detection using genetic algorithms
    Gong, Dunwei
    Zhang, Yan
    FRONTIERS OF COMPUTER SCIENCE, 2013, 7 (06) : 822 - 837
  • [48] Generating test data for both path coverage and fault detection using genetic algorithms
    Dunwei GONG
    Yan ZHANG
    Frontiers of Computer Science, 2013, 7 (06) : 822 - 837
  • [49] The detection of spatially localised outbreaks in campylobacteriosis notification data
    Spencer, Simon E. F.
    Marshall, Jonathan
    Pirie, Ruth
    Campbell, Donald
    French, Nigel P.
    SPATIAL AND SPATIO-TEMPORAL EPIDEMIOLOGY, 2011, 2 (03) : 173 - 183
  • [50] Two algorithms for extracting building models from raw laser altimetry data
    Maas, HG
    Vosselman, G
    ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 1999, 54 (2-3) : 153 - 163