Anomaly Detection in the Presence of Missing Values for Weather Data Quality Control

被引:13
|
作者
Zemicheal, Tadesse [1 ]
Dietterich, Thomas G. [1 ]
机构
[1] Oregon State Univ, Corvallis, OR 97331 USA
来源
COMPASS '19 - PROCEEDINGS OF THE CONFERENCE ON COMPUTING & SUSTAINABLE SOCIETIES | 2019年
基金
美国国家科学基金会;
关键词
D O I
10.1145/3314344.3332490
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Accurate weather data is important for improving agricultural productivity in developing countries. Unfortunately, weather sensors can fail for a wide variety of reasons. One approach to detecting failed sensors is to identify statistical anomalies in the joint distribution of sensor readings. This powerful method can break down if some of the sensor readings are missing. This paper evaluates five strategies for handling missing values in anomaly detection: (a) mean imputation, (b) MAP imputation, (c) reduction (reduced-dimension anomaly detectors via feature bagging), (d) marginalization (for density estimators only), and (e) proportional distribution (for tree-based methods only). Our analysis suggests that MAP imputation and proportional distribution should give better results than mean imputation, reduction, and marginalization. These hypotheses are largely confirmed by experimental studies on synthetic data and on anomaly detection benchmark data sets using the Isolation Forest (IF), LODA, and EGMM anomaly detection algorithms. However, marginalization worked surprisingly well for EGMM, and there are exceptions where reduction works well on some benchmark problems. We recommend proportional distribution for IF, MAP imputation for LODA, and marginalization for EGMM.
引用
收藏
页码:65 / 73
页数:9
相关论文
共 50 条
  • [31] Handling missing values in patient-reported outcome data in the presence of intercurrent events
    Thomassen, Doranne
    Roychoudhury, Satrajit
    Amdal, Cecilie Delphin
    Reynders, Dries
    Musoro, Jammbe Z.
    Sauerbrei, Willi
    Goetghebeur, Els
    le Cessie, Saskia
    SISAQOL IMI Work Package, Rajesh
    BMC MEDICAL RESEARCH METHODOLOGY, 2025, 25 (01)
  • [32] COMPARISON OF THE PREDICTIVE VALUES OF MULTIPLE BINARY DIAGNOSTIC TESTS IN THE PRESENCE OF IGNORABLE MISSING DATA
    Eugenia Marin-Jimenez, Ana
    Antonio Roldan-Nofuentes, Jose
    REVSTAT-STATISTICAL JOURNAL, 2017, 15 (01) : 45 - 64
  • [33] Data Quality Challenges with Missing Values and Mixed Types in Joint Sequence Analysis
    Lazar, Alina
    Jin, Ling
    Spurlock, C. Anna
    Todd, Annika
    Wu, Kesheng
    Sim, Alex
    2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 2620 - 2627
  • [34] Probabilistic principal component analysis-based anomaly detection for structures with missing data
    Ma, Zhi
    Yun, Chung-Bang
    Wan, Hua-Ping
    Shen, Yanbin
    Yu, Feng
    Luo, Yaozhi
    STRUCTURAL CONTROL & HEALTH MONITORING, 2021, 28 (05):
  • [35] Detection of Bad Data and Estimation of Missing Parameter Values Using System Synergism
    Khond, Sudarshan R.
    Kale, Vijay S.
    Ballal, Makarand Sudhakar
    IEEE TRANSACTIONS ON INDUSTRY APPLICATIONS, 2023, 59 (05) : 5646 - 5658
  • [36] Quality Control of Weather Radar Data Using Polarimetric Variables
    Lakshmanan, Valliappa
    Karstens, Christopher
    Krause, John
    Tang, Lin
    JOURNAL OF ATMOSPHERIC AND OCEANIC TECHNOLOGY, 2014, 31 (06) : 1234 - 1249
  • [37] ANALYSIS OF DATA WITH MISSING VALUES - COMMENTARY
    LITTLE, RJA
    STATISTICS IN MEDICINE, 1988, 7 (1-2) : 347 - 355
  • [38] Automatic quality control of weather data for timely decisions in agriculture
    Dandrifosse, Sebastien
    Jago, Alban
    Huart, Jean Pierre
    Michaud, Valery
    Planchon, Viviane
    Rosillon, Damien
    SMART AGRICULTURAL TECHNOLOGY, 2024, 8
  • [39] Handling missing values in trait data
    Johnson, Thomas F.
    Isaac, Nick J. B.
    Paviolo, Agustin
    Gonzalez-Suarez, Manuela
    GLOBAL ECOLOGY AND BIOGEOGRAPHY, 2021, 30 (01): : 51 - 62
  • [40] Analyzing Longitudinal Data With Missing Values
    Enders, Craig K.
    REHABILITATION PSYCHOLOGY, 2011, 56 (04) : 267 - 288