Building test data from real outbreaks for evaluating detection algorithms

被引:3
|
作者
Texier, Gaetan [1 ,2 ]
Jackson, Michael L. [3 ]
Siwe, Leonel [4 ]
Meynard, Jean-Baptiste [5 ]
Deparis, Xavier [5 ]
Chaudet, Herve [2 ]
机构
[1] Pasteur Ctr Cameroun, Yaounde, Cameroon
[2] Aix Marseille Univ, INSERM, IRD, UMR 912,SESSTIM,Fac Med, 27,Bd Jean Moulin, Marseille, France
[3] Grp Hlth Res Inst, Seattle, WA USA
[4] ISSEA, Yaounde, Cameroon
[5] French Armed Forces Ctr Epidemiol & Publ Hlth CES, Marseille, France
来源
PLOS ONE | 2017年 / 12卷 / 09期
关键词
INCUBATION PERIOD; INFECTIOUS-DISEASES; MARKOV-CHAINS; SURVEILLANCE; INFORMATION; DIVERGENCE; MODEL;
D O I
10.1371/journal.pone.0183992
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Benchmarking surveillance systems requires realistic simulations of disease outbreaks. However, obtaining these data in sufficient quantity, with a realistic shape and covering a sufficient range of agents, size and duration, is known to be very difficult. The dataset of outbreak signals generated should reflect the likely distribution of authentic situations faced by the surveillance system, including very unlikely outbreak signals. We propose and evaluate a new approach based on the use of historical outbreak data to simulate tailored outbreak signals. The method relies on a homothetic transformation of the historical distribution followed by resampling processes (Binomial, Inverse Transform Sampling Method-ITSM, Metropolis-Hasting Random Walk, Metropolis-Hasting Independent, Gibbs Sampler, Hybrid Gibbs Sampler). We carried out an analysis to identify the most important input parameters for simulation quality and to evaluate performance for each of the resampling algorithms. Our analysis confirms the influence of the type of algorithm used and simulation parameters (i.e. days, number of cases, outbreak shape, overall scale factor) on the results. We show that, regardless of the outbreaks, algorithms and metrics chosen for the evaluation, simulation quality decreased with the increase in the number of days simulated and increased with the number of cases simulated. Simulating outbreaks with fewer cases than days of duration (i.e. overall scale factor less than 1) resulted in an important loss of information during the simulation. We found that Gibbs sampling with a shrinkage procedure provides a good balance between accuracy and data dependency. If dependency is of little importance, binomial and ITSM methods are accurate. Given the constraint of keeping the simulation within a range of plausible epidemiological curves faced by the surveillance system, our study confirms that our approach can be used to generate a large spectrum of outbreak signals.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] Evaluating spatial surveillance: detection of known outbreaks in real data
    Kleinman, K
    Abrams, A
    Yih, WK
    Platt, R
    Kulldorff, M
    STATISTICS IN MEDICINE, 2006, 25 (05) : 755 - 769
  • [2] A virtual test bed for evaluating advanced building automation algorithms
    Storek, Thomas
    Wuellhorst, Fabian
    Kossler, Silas
    Baranski, Marc
    Kuempel, Alexander
    Mueller, Dirk
    PROCEEDINGS OF BUILDING SIMULATION 2021: 17TH CONFERENCE OF IBPSA, 2022, 17 : 3188 - 3195
  • [3] EVALUATING THE IGRAPH COMMUNITY DETECTION ALGORITHMS ON DIFFERENT REAL NETWORKS
    Oza, Parita
    Agrawal, Smita
    Ravaliya, Dhruv
    Kakkar, Riya
    SCALABLE COMPUTING-PRACTICE AND EXPERIENCE, 2023, 24 (02): : 173 - 180
  • [4] Evaluating algorithms for anomaly detection in satellite telemetry data
    Nalepa, Jakub
    Myller, Michal
    Andrzejewski, Jacek
    Benecki, Pawel
    Piechaczek, Szymon
    Kostrzewa, Daniel
    ACTA ASTRONAUTICA, 2022, 198 : 689 - 701
  • [5] Evaluating the ability of temporal aberration-detection algorithms to detect simulated disease outbreaks in routinely collected cattle mortality data
    Struchen, R.
    Zinsstag, J.
    Vial, F.
    TROPICAL MEDICINE & INTERNATIONAL HEALTH, 2015, 20 : 226 - 226
  • [6] Evaluating Test Data Generation for Untyped Data Structures Using Genetic Algorithms
    Gerlich, Ralf
    Prause, Christian R.
    2018 IEEE 11TH INTERNATIONAL CONFERENCE ON SOFTWARE TESTING, VERIFICATION AND VALIDATION WORKSHOPS (ICSTW), 2018, : 126 - 129
  • [7] Evaluation of Duplicate Detection Algorithms: From Quality Measures to Test Data Generation
    Panse, Fabian
    Naumann, Felix
    2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2021), 2021, : 2373 - 2376
  • [8] Evaluating Deep Learning Algorithms for Real-Time Arrhythmia Detection
    Petty, Tyler
    Vu, Thong
    Zhao, Xinghui
    Hirsh, Robert A.
    Murray, Greggory
    Haas, Francis M.
    Xue, Wei
    2020 IEEE/ACM INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING, APPLICATIONS AND TECHNOLOGIES (BDCAT 2020), 2020, : 19 - 26
  • [9] Evaluating Fraud Detection Algorithms using an Auction Data Generator
    Tsang, Sidney
    Dobbie, Gillian
    Koh, Yun Sing
    12TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2012), 2012, : 332 - 339
  • [10] CHARACTERIZATION AND DETECTION OF BUILDING PATTERNS IN CARTOGRAPHIC DATA: TWO ALGORITHMS
    Zhang, Xiang
    Ai, Tinghua
    Stoter, Jantien
    JOINT INTERNATIONAL CONFERENCE ON THEORY, DATA HANDLING AND MODELLING IN GEOSPATIAL INFORMATION SCIENCE, 2010, 38 : 261 - 266