Robust Multivariate Outlier Detection Methods for Environmental Data

被引:25
|
作者
Alameddine, Ibrahim [1 ]
Kenney, Melissa A. [2 ]
Gosnell, Russell J. [3 ]
Reckhow, Kenneth H. [1 ]
机构
[1] Duke Univ, Nicholas Sch Environm, Durham, NC 27708 USA
[2] Johns Hopkins Univ, Dept Geog & Environm Engn, Baltimore, MD 21218 USA
[3] N Carolina Cent Univ, Dept Math & Comp Sci, Durham, NC 27707 USA
来源
基金
美国国家科学基金会;
关键词
Robust outlier detection; Water quality; Data analysis; Eutrophication; National lake eutrophication survey; Outlier; PRINCIPAL COMPONENT; HIGH DIMENSION; WATER-QUALITY; ESTIMATORS; LOCATION; SHAPE;
D O I
10.1061/(ASCE)EE.1943-7870.0000271
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Outliers are an inevitable concern that needs to be identified and dealt with whenever one analyzes a large data set. Today's water quality data are often collected on different scales, encompass several sites, monitor several correlated parameters, involve a multitude of individuals from several agencies, and span over several years. As such, the ability to identify outliers, which may affect the results of the analysis, is crucial. This note presents several statistical techniques that have been developed to deal with this problem, with particular emphasis on robust multivariate methods. These techniques are capable of isolating outliers while overcoming the effects of masking that can hinder the effectiveness of common outlier detection techniques such as Mahalanobis distances (MD). This note uses a comprehensive national metadata set on lake water quality as a case study to analyze the effectiveness of three robust outlier detection techniques, namely, the minimum covariance determinant (MCD), the minimum volume ellipsoid (MVE), and M-estimators. The note compares the results generated from these three techniques to assess the severity of each method when it comes to labeling observations as outliers. The results demonstrate the limitations of using MD to analyze multidimensional water quality data. The analysis also highlighted the differences between the three robust multivariate methods, whereby the MVE method was found to be the most severe when it came to outlier detection, while the MCD was the most lenient. Of the three robust multivariate outlier detection methods analyzed, the M-estimator proved to be the most flexible because it allowed for downweighting rather than censoring many borderline outlier observations.
引用
收藏
页码:1299 / 1304
页数:6
相关论文
共 50 条
  • [1] Robust Outlier Detection Method For Multivariate Spatial Data
    Sweta Shukla
    S. Lalitha
    [J]. National Academy Science Letters, 2021, 44 : 551 - 554
  • [2] Robust Outlier Detection Method For Multivariate Spatial Data
    Shukla, Sweta
    Lalitha, S.
    [J]. NATIONAL ACADEMY SCIENCE LETTERS-INDIA, 2021, 44 (06): : 551 - 554
  • [3] Multivariate Spatial Outlier Detection Using Robust Geographically Weighted Methods
    Harris, Paul
    Brunsdon, Chris
    Charlton, Martin
    Juggins, Steve
    Clarke, Annemarie
    [J]. MATHEMATICAL GEOSCIENCES, 2014, 46 (01) : 1 - 31
  • [4] Outlier detection for compositional data using robust methods
    Filzmoser, Peter
    Hron, Karel
    [J]. MATHEMATICAL GEOSCIENCES, 2008, 40 (03) : 233 - 248
  • [5] Multivariate Spatial Outlier Detection Using Robust Geographically Weighted Methods
    Paul Harris
    Chris Brunsdon
    Martin Charlton
    Steve Juggins
    Annemarie Clarke
    [J]. Mathematical Geosciences, 2014, 46 : 1 - 31
  • [6] Outlier Detection for Compositional Data Using Robust Methods
    Peter Filzmoser
    Karel Hron
    [J]. Mathematical Geosciences, 2008, 40 : 233 - 248
  • [7] Comparison of multivariate outlier detection methods
    Caroni, C
    Prescott, P
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES D-THE STATISTICIAN, 2002, 51 : 395 - 396
  • [8] Hit screening with multivariate robust outlier detection
    Leong, Hui Sun
    Zhang, Tianhui
    Corrigan, Adam
    Serrano, Alessia
    Kunzel, Ulrike
    Mullooly, Niamh
    Wiggins, Ceri
    Wang, Yinhai
    Novick, Steven
    [J]. PLOS ONE, 2024, 19 (09):
  • [9] A comparison of multivariate outlier detection methods for clinical laboratory safety data
    Penny, KI
    Jolliffe, IT
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES D-THE STATISTICIAN, 2001, 50 : 295 - 308
  • [10] Robust Fitting of a Wrapped Normal Model to Multivariate Circular Data and Outlier Detection
    Greco, Luca
    Saraceno, Giovanni
    Agostinelli, Claudio
    [J]. STATS, 2021, 4 (02): : 454 - 471