Detection of multivariate outliers in business survey data with incomplete information

被引:27
|
作者
Todorov, Valentin [1 ]
Templ, Matthias [2 ,3 ]
Filzmoser, Peter [3 ]
机构
[1] UNIDO, Vienna Int Ctr, A-1400 Vienna, Austria
[2] Vienna Univ Technol, Dept Methodol, A-1040 Vienna, Austria
[3] Vienna Univ Technol, Dept Stat & Probabil Theory, A-1040 Vienna, Austria
关键词
Multivariate outlier detection; Robust statistics; Missing values; LOCATION; ESTIMATORS; IMPUTATION; BACON; VIEW;
D O I
10.1007/s11634-010-0075-2
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Many different methods for statistical data editing can be found in the literature but only few of them are based on robust estimates (for example such as BACON-EEM, epidemic algorithms (EA) and transformed rank correlation (TRC) methods of B,guin and Hulliger). However, we can show that outlier detection is only reasonable if robust methods are applied, because the classical estimates are themselves influenced by the outliers. Nevertheless, data editing is essential to check the multivariate data for possible data problems and it is not deterministic like the traditional micro editing where all records are extensively edited manually using certain rules/constraints. The presence of missing values is more a rule than an exception in business surveys and poses additional severe challenges to the outlier detection. First we review the available multivariate outlier detection methods which can cope with incomplete data. In a simulation study, where a subset of the Austrian Structural Business Statistics is simulated, we compare several approaches. Robust methods based on the Minimum Covariance Determinant (MCD) estimator, S-estimators and OGK-estimator as well as BACON-BEM provide the best results in finding the outliers and in providing a low false discovery rate. Many of the discussed methods are implemented in the R package under the GNU General Public License.
引用
收藏
页码:37 / 56
页数:20
相关论文
共 50 条
  • [21] Detecting Outliers in Multivariate Laboratory Data
    Southworth, Harry
    JOURNAL OF BIOPHARMACEUTICAL STATISTICS, 2008, 18 (06) : 1178 - 1183
  • [22] Interpretation of multivariate outliers for compositional data
    Filzmoser, Peter
    Hron, Karel
    Reimann, Clemens
    COMPUTERS & GEOSCIENCES, 2012, 39 : 77 - 85
  • [23] DETECTION OF TWO-WAY OUTLIERS IN MULTIVARIATE DATA AND APPLICATION TO CHEATING DETECTION IN EDUCATIONAL TESTS
    Chen, Yunxiao
    Lu, Yan
    Moustaki, Irini
    ANNALS OF APPLIED STATISTICS, 2022, 16 (03): : 1718 - 1746
  • [24] Data-driven cluster analysis method: a novel outliers detection method in multivariate data
    Duarte, A. R.
    Barbosa, J. J.
    Martins, H. S. R.
    Oliveira, F. L. P.
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2024,
  • [25] Outliers detection in multivariate spatial linear models
    Militino, AF
    Palacios, MB
    Ugarte, MD
    JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2006, 136 (01) : 125 - 146
  • [26] MULTIVARIATE TESTS WITH INCOMPLETE DATA
    EATON, M
    KARIYA, T
    ANNALS OF STATISTICS, 1983, 11 (02): : 654 - 665
  • [27] A MODIFICATION OF A METHOD FOR THE DETECTION OF OUTLIERS IN MULTIVARIATE SAMPLES
    HADI, AS
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1994, 56 (02): : 393 - 396
  • [28] DETECTION OF MULTIVARIATE OUTLIERS IN ELLIPTICALLY SYMMETRIC DISTRIBUTIONS
    SINHA, BK
    ANNALS OF STATISTICS, 1984, 12 (04): : 1558 - 1565
  • [29] GENERALIZATION OF GAP TEST FOR DETECTION OF MULTIVARIATE OUTLIERS
    ROHLF, FJ
    BIOMETRICS, 1975, 31 (01) : 93 - 101
  • [30] A multivariate fuzzy system applied for outliers detection
    Cateni, Silvia
    Colla, Valentina
    Nastasi, Gianluca
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2013, 24 (04) : 889 - 903