Anomaly Detection in the Presence of Missing Values for Weather Data Quality Control

被引:13
|
作者
Zemicheal, Tadesse [1 ]
Dietterich, Thomas G. [1 ]
机构
[1] Oregon State Univ, Corvallis, OR 97331 USA
基金
美国国家科学基金会;
关键词
D O I
10.1145/3314344.3332490
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Accurate weather data is important for improving agricultural productivity in developing countries. Unfortunately, weather sensors can fail for a wide variety of reasons. One approach to detecting failed sensors is to identify statistical anomalies in the joint distribution of sensor readings. This powerful method can break down if some of the sensor readings are missing. This paper evaluates five strategies for handling missing values in anomaly detection: (a) mean imputation, (b) MAP imputation, (c) reduction (reduced-dimension anomaly detectors via feature bagging), (d) marginalization (for density estimators only), and (e) proportional distribution (for tree-based methods only). Our analysis suggests that MAP imputation and proportional distribution should give better results than mean imputation, reduction, and marginalization. These hypotheses are largely confirmed by experimental studies on synthetic data and on anomaly detection benchmark data sets using the Isolation Forest (IF), LODA, and EGMM anomaly detection algorithms. However, marginalization worked surprisingly well for EGMM, and there are exceptions where reduction works well on some benchmark problems. We recommend proportional distribution for IF, MAP imputation for LODA, and marginalization for EGMM.
引用
收藏
页码:65 / 73
页数:9
相关论文
共 50 条
  • [1] Unsupervised Anomaly Detection in Data Quality Control
    Poon, Lex
    Farshidi, Siamak
    Li, Na
    Zhao, Zhiming
    2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 2327 - 2336
  • [2] Collection of Historical Weather Data: Issues with Missing Values
    Rafii, Fadoua
    Kechadi, Tahar
    4TH INTERNATIONAL CONFERENCE ON SMART CITY APPLICATIONS (SCA' 19), 2019,
  • [3] Clustering Data with the Presence of Missing Values by Ensemble Approach
    Pattanodom, Mullika
    Iam-On, Natthakan
    Boongoen, Tossapon
    2016 SECOND ASIAN CONFERENCE ON DEFENCE TECHNOLOGY (ACDT), 2016, : 151 - 156
  • [4] A Generic Approach of Filling Missing Values in NCDC Weather Stations Data
    Hosahalli, Doreswamy
    Gad, Ibrahim
    2018 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2018, : 143 - 149
  • [5] Anomaly Detection for Analysis of Annual Inventory Data: A Quality Control Approach
    Roesch, Francis A.
    Van Deusen, Paul C.
    SOUTHERN JOURNAL OF APPLIED FORESTRY, 2010, 34 (03): : 131 - 137
  • [6] Monitoring data quality for telehealth systems in the presence of missing data
    Mahmood, Tahir
    Wittenberg, Philipp
    Zwetsloot, Inez Maria
    Wang, Hailiang
    Tsui, Kwok Leung
    INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2019, 126 : 156 - 163
  • [7] The Use of Spatial Interpolation to Improve the Quality of Corn Silage Data in Case of Presence of Extreme or Missing Values
    Koutsos, Thomas M.
    Menexes, Georgios C.
    Eleftherohorinos, Ilias G.
    ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2022, 11 (03)
  • [8] Graph spatiotemporal process for multivariate time series anomaly detection with missing values
    Zheng, Yu
    Koh, Huan Yee
    Jin, Ming
    Chi, Lianhua
    Wang, Haishuai
    Phan, Khoa T.
    Chen, Yi-Ping Phoebe
    Pan, Shirui
    Xiang, Wei
    INFORMATION FUSION, 2024, 106
  • [9] NEW CONTROL CHART FOR MULTIVARIATE DATA WITH MISSING VALUES
    FURUTANI, H
    YAMAMOTO, K
    OGURA, H
    KITAZOE, Y
    COMPUTERS AND BIOMEDICAL RESEARCH, 1988, 21 (01): : 1 - 8
  • [10] Change-Point Detection for Graphical Models in the Presence of Missing Values
    Londschien, Malte
    Kovacs, Solt
    Buhlmann, Peter
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2021, 30 (03) : 768 - 779