Anomaly Detection in the Presence of Missing Values for Weather Data Quality Control

被引:13
|
作者
Zemicheal, Tadesse [1 ]
Dietterich, Thomas G. [1 ]
机构
[1] Oregon State Univ, Corvallis, OR 97331 USA
来源
COMPASS '19 - PROCEEDINGS OF THE CONFERENCE ON COMPUTING & SUSTAINABLE SOCIETIES | 2019年
基金
美国国家科学基金会;
关键词
D O I
10.1145/3314344.3332490
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Accurate weather data is important for improving agricultural productivity in developing countries. Unfortunately, weather sensors can fail for a wide variety of reasons. One approach to detecting failed sensors is to identify statistical anomalies in the joint distribution of sensor readings. This powerful method can break down if some of the sensor readings are missing. This paper evaluates five strategies for handling missing values in anomaly detection: (a) mean imputation, (b) MAP imputation, (c) reduction (reduced-dimension anomaly detectors via feature bagging), (d) marginalization (for density estimators only), and (e) proportional distribution (for tree-based methods only). Our analysis suggests that MAP imputation and proportional distribution should give better results than mean imputation, reduction, and marginalization. These hypotheses are largely confirmed by experimental studies on synthetic data and on anomaly detection benchmark data sets using the Isolation Forest (IF), LODA, and EGMM anomaly detection algorithms. However, marginalization worked surprisingly well for EGMM, and there are exceptions where reduction works well on some benchmark problems. We recommend proportional distribution for IF, MAP imputation for LODA, and marginalization for EGMM.
引用
收藏
页码:65 / 73
页数:9
相关论文
共 50 条
  • [41] ANALYSIS OF DATA WITH MISSING VALUES - DISCUSSION
    HELMS, RW
    LAIRD, NM
    LEBOWITZ, MD
    MANTEL, N
    LOUIS, TA
    WU, M
    STATISTICS IN MEDICINE, 1988, 7 (1-2) : 357 - 360
  • [42] ANOVA FOR LONGITUDINAL DATA WITH MISSING VALUES
    Chen, Song Xi
    Zhong, Ping-Shou
    ANNALS OF STATISTICS, 2010, 38 (06): : 3630 - 3659
  • [43] Missing values in monotone data sets
    Popova, Viara
    ISDA 2006: Sixth International Conference on Intelligent Systems Design and Applications, Vol 1, 2006, : 627 - 632
  • [44] SPECTRA FROM DATA WITH MISSING VALUES
    HARRIS, RW
    MECHANICAL SYSTEMS AND SIGNAL PROCESSING, 1987, 1 (01) : 97 - 104
  • [45] Dealing with missing values in proteomics data
    Kong, Weijia
    Hui, Harvard Wai Hann
    Peng, Hui
    Bin Goh, Wilson Wen
    PROTEOMICS, 2022, 22 (23-24)
  • [46] Dealing with Missing Values in Microarray Data
    Mohammadi, Azadeh
    Saraee, Mohammad Hossein
    2008 INTERNATIONAL CONFERENCE ON EMERGING TECHNOLOGIES, PROCEEDINGS, 2008, : 258 - 263
  • [47] Comparison of missing data imputation methods using weather data
    Nida, Hafiza
    Kashif, Muhammad
    Khan, Muhammad Imran
    Ghamkhar, Madiha
    PAKISTAN JOURNAL OF AGRICULTURAL SCIENCES, 2023, 60 (02): : 327 - 336
  • [48] Statistical control charts for quality control of weather data for reference evapotranspiration estimation
    Eching, SO
    Snyder, RL
    PROCEEDINGS OF THE IVTH INTERNATIONAL SYMPOSIUM ON IRRIGATION OF HORTICULTURAL CROPS, 2004, (664): : 189 - 196
  • [49] Flexible decision tree for data stream classification in the presence of concept change, noise and missing values
    Sattar Hashemi
    Ying Yang
    Data Mining and Knowledge Discovery, 2009, 19 : 95 - 131
  • [50] Inferences Concerning Exponential Distributions in the Presence of Randomly Right Censored Data with Missing Censored Values
    Bahadur Singh
    Lifetime Data Analysis, 2002, 8 : 69 - 88