Detection of multivariate outliers in business survey data with incomplete information

被引:27
|
作者
Todorov, Valentin [1 ]
Templ, Matthias [2 ,3 ]
Filzmoser, Peter [3 ]
机构
[1] UNIDO, Vienna Int Ctr, A-1400 Vienna, Austria
[2] Vienna Univ Technol, Dept Methodol, A-1040 Vienna, Austria
[3] Vienna Univ Technol, Dept Stat & Probabil Theory, A-1040 Vienna, Austria
关键词
Multivariate outlier detection; Robust statistics; Missing values; LOCATION; ESTIMATORS; IMPUTATION; BACON; VIEW;
D O I
10.1007/s11634-010-0075-2
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Many different methods for statistical data editing can be found in the literature but only few of them are based on robust estimates (for example such as BACON-EEM, epidemic algorithms (EA) and transformed rank correlation (TRC) methods of B,guin and Hulliger). However, we can show that outlier detection is only reasonable if robust methods are applied, because the classical estimates are themselves influenced by the outliers. Nevertheless, data editing is essential to check the multivariate data for possible data problems and it is not deterministic like the traditional micro editing where all records are extensively edited manually using certain rules/constraints. The presence of missing values is more a rule than an exception in business surveys and poses additional severe challenges to the outlier detection. First we review the available multivariate outlier detection methods which can cope with incomplete data. In a simulation study, where a subset of the Austrian Structural Business Statistics is simulated, we compare several approaches. Robust methods based on the Minimum Covariance Determinant (MCD) estimator, S-estimators and OGK-estimator as well as BACON-BEM provide the best results in finding the outliers and in providing a low false discovery rate. Many of the discussed methods are implemented in the R package under the GNU General Public License.
引用
收藏
页码:37 / 56
页数:20
相关论文
共 50 条
  • [41] Multivariate Outlier Detection in Applied Data Analysis: Global, Local, Compositional and Cellwise Outliers
    Filzmoser, Peter
    Gregorich, Mariella
    MATHEMATICAL GEOSCIENCES, 2020, 52 (08) : 1049 - 1066
  • [42] Multivariate Outlier Detection in Applied Data Analysis: Global, Local, Compositional and Cellwise Outliers
    Peter Filzmoser
    Mariella Gregorich
    Mathematical Geosciences, 2020, 52 : 1049 - 1066
  • [43] MacroPARAFAC for handling rowwise and cellwise outliers in incomplete data
    Hubert, Mia
    Hirari, Mehdi
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2024, 251
  • [44] On Fuzzy Clustering for Incomplete Spherical Data and for Incomplete Multivariate Categorical Data
    Kanzawa, Yuchi
    2018 JOINT 10TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS (SCIS) AND 19TH INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS (ISIS), 2018, : 638 - 643
  • [45] DETECTION OF OUTLIERS IN FAMILIAL DATA
    BHANDARY, M
    BANSAL, NK
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 1993, 22 (09) : 2669 - 2685
  • [47] Detection and treatment of outliers for multivariate robust loss reserving
    Avanzi, Benjamin
    Lavender, Mark
    Taylor, Greg
    Wong, Bernard
    ANNALS OF ACTUARIAL SCIENCE, 2024, 18 (01) : 102 - 125
  • [48] A New Robust Estimator to Detect Outliers for Multivariate Data
    Abd Mutalib, Sharifah Sakinah Syed
    Satari, Siti Zanariah
    Yusoff, Wan Nur Syahidah Wan
    2ND INTERNATIONAL CONFERENCE ON APPLIED & INDUSTRIAL MATHEMATICS AND STATISTICS, 2019, 1366
  • [49] Multivariate analysis of incomplete mapped data
    Dray, Stéphane
    Pettorelli, Nathalie
    Chessel, Daniel
    Transactions in GIS, 2003, 7 (03) : 411 - 422
  • [50] MULTIVARIATE TESTS OF HYPOTHESES WITH INCOMPLETE DATA
    BHARGAVA, RP
    ANNALS OF MATHEMATICAL STATISTICS, 1962, 33 (04): : 1503 - &