Identification of outliers in multivariate data

被引:228
|
作者
Rocke, DM
Woodruff, DL
机构
关键词
heuristic search; M estimation; minimum covariance determinant; S estimation;
D O I
10.2307/2291724
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
New insights are given into why the problem of detecting multivariate outliers can be difficult and why the difficulty increases with the dimension of the data. Significant improvements in methods for detecting outliers are described, and extensive simulation experiments demonstrate that a hybrid method extends the practical boundaries of outlier detection capabilities. Based on simulation results and examples from the literature, the question of what levels of contamination can be detected by this algorithm as a function of dimension, computation time, sample size, contamination fraction, and distance of the contamination from the main body of data is investigated. Software to implement the methods is available from the authors and STATLIB.
引用
收藏
页码:1047 / 1061
页数:15
相关论文
共 50 条
  • [1] ESTIMATION OF CONTAMINATION PARAMETERS AND IDENTIFICATION OF OUTLIERS IN MULTIVARIATE DATA
    BERKANE, M
    BENTLER, PM
    [J]. SOCIOLOGICAL METHODS & RESEARCH, 1988, 17 (01) : 55 - 64
  • [2] Identification of local multivariate outliers
    Filzmoser, Peter
    Ruiz-Gazen, Anne
    Thomas-Agnan, Christine
    [J]. STATISTICAL PAPERS, 2014, 55 (01) : 29 - 47
  • [3] Identification of local multivariate outliers
    Peter Filzmoser
    Anne Ruiz-Gazen
    Christine Thomas-Agnan
    [J]. Statistical Papers, 2014, 55 : 29 - 47
  • [4] ON THE DETECTION OF MULTIVARIATE DATA OUTLIERS AND REGRESSION OUTLIERS
    LAZRAQ, A
    CLEROUX, R
    [J]. DATA ANALYSIS, LEARNING SYMBOLIC AND NUMERIC KNOWLEDGE, 1989, : 133 - 140
  • [5] Correlation of Outliers in Multivariate Data
    Kaszuba, Bartosz
    [J]. DATA ANALYSIS, MACHINE LEARNING AND KNOWLEDGE DISCOVERY, 2014, : 265 - 272
  • [6] PROPAGATION OF OUTLIERS IN MULTIVARIATE DATA
    Alqallaf, Fatemah
    Van Aelst, Stefan
    Yohai, Victor J.
    Zamar, Ruben H.
    [J]. ANNALS OF STATISTICS, 2009, 37 (01): : 311 - 331
  • [7] Scalable fuzzy multivariate outliers identification towards big data applications
    Touny, Huda Mohammed
    Moussa, Ahmed Shawky
    Hadi, Ali S.
    [J]. APPLIED SOFT COMPUTING, 2024, 155
  • [8] Stability of gene contributions and identification of outliers in multivariate analysis of microarray data
    Florent Baty
    Daniel Jaeger
    Frank Preiswerk
    Martin M Schumacher
    Martin H Brutsche
    [J]. BMC Bioinformatics, 9
  • [9] Stability of gene contributions and identification of outliers in multivariate analysis of microarray data
    Baty, Florent
    Jaeger, Daniel
    Preiswerk, Frank
    Schumacher, Martin M.
    Brutsche, Martin H.
    [J]. BMC BIOINFORMATICS, 2008, 9 (1)
  • [10] Identification of Multivariate Outliers: A Performance Study
    Filzmoser, Peter
    [J]. AUSTRIAN JOURNAL OF STATISTICS, 2005, 34 (02) : 127 - 138