Anomaly Detection in the Presence of Missing Values for Weather Data Quality Control

被引:13
|
作者
Zemicheal, Tadesse [1 ]
Dietterich, Thomas G. [1 ]
机构
[1] Oregon State Univ, Corvallis, OR 97331 USA
来源
COMPASS '19 - PROCEEDINGS OF THE CONFERENCE ON COMPUTING & SUSTAINABLE SOCIETIES | 2019年
基金
美国国家科学基金会;
关键词
D O I
10.1145/3314344.3332490
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Accurate weather data is important for improving agricultural productivity in developing countries. Unfortunately, weather sensors can fail for a wide variety of reasons. One approach to detecting failed sensors is to identify statistical anomalies in the joint distribution of sensor readings. This powerful method can break down if some of the sensor readings are missing. This paper evaluates five strategies for handling missing values in anomaly detection: (a) mean imputation, (b) MAP imputation, (c) reduction (reduced-dimension anomaly detectors via feature bagging), (d) marginalization (for density estimators only), and (e) proportional distribution (for tree-based methods only). Our analysis suggests that MAP imputation and proportional distribution should give better results than mean imputation, reduction, and marginalization. These hypotheses are largely confirmed by experimental studies on synthetic data and on anomaly detection benchmark data sets using the Isolation Forest (IF), LODA, and EGMM anomaly detection algorithms. However, marginalization worked surprisingly well for EGMM, and there are exceptions where reduction works well on some benchmark problems. We recommend proportional distribution for IF, MAP imputation for LODA, and marginalization for EGMM.
引用
收藏
页码:65 / 73
页数:9
相关论文
共 50 条
  • [21] Mining Itemsets in the Presence of Missing Values
    Calders, Toon
    Goethals, Bart
    Mampaey, Michael
    APPLIED COMPUTING 2007, VOL 1 AND 2, 2007, : 404 - +
  • [22] Detection of the true disease susceptibility site in the presence of missing data
    Croiseau, P.
    Cordell, H. J.
    Genin, E.
    GENETIC EPIDEMIOLOGY, 2007, 31 (05) : 468 - 468
  • [23] XeroGraph: enhancing data integrity in the presence of missing values with statistical and predictive analysis
    Alasal, Laila Mousafi
    Hammarlund, Emma U.
    Pienta, Kenneth J.
    Ronnstrand, Lars
    Kazi, Julhash U.
    BIOINFORMATICS ADVANCES, 2025, 5 (01):
  • [24] A Comparison of Various Imputation Methods for Missing Values in Air Quality Data
    Zainuri, Nuryazmin Ahmat
    Jemain, Abdul Aziz
    Muda, Nora
    SAINS MALAYSIANA, 2015, 44 (03): : 449 - 456
  • [25] Imputing Missing Values from Low Quality Data by NIP Tooly
    Martinez, Raquel
    Cadenas, Jose M.
    Carmen Garrido, M.
    Martinez, Alejandro
    2013 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ - IEEE 2013), 2013,
  • [26] MISSING VALUES IN MULTIVARIATE DATA
    KUZMA, JW
    BIOMETRICS, 1965, 21 (01) : 254 - &
  • [27] Islanding Detection Based on Probabilistic PCA with Missing Values in PMU Data
    Liu, Xueqin
    Laverty, David
    Best, Robert
    2014 IEEE PES GENERAL MEETING - CONFERENCE & EXPOSITION, 2014,
  • [28] Quality control of weather data during extreme events
    You, JS
    Hubbard, KG
    JOURNAL OF ATMOSPHERIC AND OCEANIC TECHNOLOGY, 2006, 23 (02) : 184 - 197
  • [29] Measuring Local Assortativity in the Presence of Missing Values
    van der Laan, Jan
    de Jonge, Edwin
    COMPLEX NETWORKS AND THEIR APPLICATIONS VIII, VOL 2, 2020, 882 : 280 - 290
  • [30] Prediction of Mortality Rates in the Presence of Missing Values
    Tan, Chon Sern
    Pooi, Ah Him
    INNOVATION AND ANALYTICS CONFERENCE AND EXHIBITION (IACE 2015), 2015, 1691