Building test data from real outbreaks for evaluating detection algorithms

被引：3

作者：

Texier, Gaetan ^{[1
,2
]}

Jackson, Michael L. ^{[3
]}

Siwe, Leonel ^{[4
]}

Meynard, Jean-Baptiste ^{[5
]}

Deparis, Xavier ^{[5
]}

Chaudet, Herve ^{[2
]}

机构：

[1] Pasteur Ctr Cameroun, Yaounde, Cameroon

[2] Aix Marseille Univ, INSERM, IRD, UMR 912,SESSTIM,Fac Med, 27,Bd Jean Moulin, Marseille, France

[3] Grp Hlth Res Inst, Seattle, WA USA

[4] ISSEA, Yaounde, Cameroon

[5] French Armed Forces Ctr Epidemiol & Publ Hlth CES, Marseille, France

来源：

PLOS ONE | 2017年 / 12卷 / 09期

关键词：

INCUBATION PERIOD; INFECTIOUS-DISEASES; MARKOV-CHAINS; SURVEILLANCE; INFORMATION; DIVERGENCE; MODEL;

D O I：

10.1371/journal.pone.0183992

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Benchmarking surveillance systems requires realistic simulations of disease outbreaks. However, obtaining these data in sufficient quantity, with a realistic shape and covering a sufficient range of agents, size and duration, is known to be very difficult. The dataset of outbreak signals generated should reflect the likely distribution of authentic situations faced by the surveillance system, including very unlikely outbreak signals. We propose and evaluate a new approach based on the use of historical outbreak data to simulate tailored outbreak signals. The method relies on a homothetic transformation of the historical distribution followed by resampling processes (Binomial, Inverse Transform Sampling Method-ITSM, Metropolis-Hasting Random Walk, Metropolis-Hasting Independent, Gibbs Sampler, Hybrid Gibbs Sampler). We carried out an analysis to identify the most important input parameters for simulation quality and to evaluate performance for each of the resampling algorithms. Our analysis confirms the influence of the type of algorithm used and simulation parameters (i.e. days, number of cases, outbreak shape, overall scale factor) on the results. We show that, regardless of the outbreaks, algorithms and metrics chosen for the evaluation, simulation quality decreased with the increase in the number of days simulated and increased with the number of cases simulated. Simulating outbreaks with fewer cases than days of duration (i.e. overall scale factor less than 1) resulted in an important loss of information during the simulation. We found that Gibbs sampling with a shrinkage procedure provides a good balance between accuracy and data dependency. If dependency is of little importance, binomial and ITSM methods are accurate. Given the constraint of keeping the simulation within a range of plausible epidemiological curves faced by the surveillance system, our study confirms that our approach can be used to generate a large spectrum of outbreak signals.

引用

页数：17

共 50 条

[1] Evaluating spatial surveillance: detection of known outbreaks in real data
Kleinman, K
Abrams, A
Yih, WK
Platt, R
Kulldorff, M
STATISTICS IN MEDICINE, 2006, 25 (05) : 755 - 769
[2] A virtual test bed for evaluating advanced building automation algorithms
Storek, Thomas
Wuellhorst, Fabian
Kossler, Silas
Baranski, Marc
Kuempel, Alexander
Mueller, Dirk
PROCEEDINGS OF BUILDING SIMULATION 2021: 17TH CONFERENCE OF IBPSA, 2022, 17 : 3188 - 3195
[3] EVALUATING THE IGRAPH COMMUNITY DETECTION ALGORITHMS ON DIFFERENT REAL NETWORKS
Oza, Parita
Agrawal, Smita
Ravaliya, Dhruv
Kakkar, Riya
SCALABLE COMPUTING-PRACTICE AND EXPERIENCE, 2023, 24 (02): : 173 - 180
[4] Evaluating algorithms for anomaly detection in satellite telemetry data
Nalepa, Jakub
Myller, Michal
Andrzejewski, Jacek
Benecki, Pawel
Piechaczek, Szymon
Kostrzewa, Daniel
ACTA ASTRONAUTICA, 2022, 198 : 689 - 701
[5] Evaluating the ability of temporal aberration-detection algorithms to detect simulated disease outbreaks in routinely collected cattle mortality data
Struchen, R.
Zinsstag, J.
Vial, F.
TROPICAL MEDICINE & INTERNATIONAL HEALTH, 2015, 20 : 226 - 226
[6] Evaluating Test Data Generation for Untyped Data Structures Using Genetic Algorithms
Gerlich, Ralf
Prause, Christian R.
2018 IEEE 11TH INTERNATIONAL CONFERENCE ON SOFTWARE TESTING, VERIFICATION AND VALIDATION WORKSHOPS (ICSTW), 2018, : 126 - 129
[7] Evaluation of Duplicate Detection Algorithms: From Quality Measures to Test Data Generation
Panse, Fabian
Naumann, Felix
2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2021), 2021, : 2373 - 2376
[8] Evaluating Deep Learning Algorithms for Real-Time Arrhythmia Detection
Petty, Tyler
Vu, Thong
Zhao, Xinghui
Hirsh, Robert A.
Murray, Greggory
Haas, Francis M.
Xue, Wei
2020 IEEE/ACM INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING, APPLICATIONS AND TECHNOLOGIES (BDCAT 2020), 2020, : 19 - 26
[9] Evaluating Fraud Detection Algorithms using an Auction Data Generator
Tsang, Sidney
Dobbie, Gillian
Koh, Yun Sing
12TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2012), 2012, : 332 - 339
[10] CHARACTERIZATION AND DETECTION OF BUILDING PATTERNS IN CARTOGRAPHIC DATA: TWO ALGORITHMS
Zhang, Xiang
Ai, Tinghua
Stoter, Jantien
JOINT INTERNATIONAL CONFERENCE ON THEORY, DATA HANDLING AND MODELLING IN GEOSPATIAL INFORMATION SCIENCE, 2010, 38 : 261 - 266

← 1 2 3 4 5 →