Outlier detection in large data sets

被引:53
|
作者
Buzzi-Ferraris, Guido [1 ]
Manenti, Flavio [1 ]
机构
[1] Politecn Milan, Dipartimento Chim Mat & Ingn Chim Giulio Natta, I-20133 Milan, Italy
关键词
Outliers; Reliable parameter estimation; Robustness; Large data sets;
D O I
10.1016/j.compchemeng.2010.11.004
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In this paper we propose a method for correctly detecting outliers based on a new technique developed to simultaneously evaluate mean, variance and outliers. This method is capable of self-regulating its robustness to suit the experimental data set under analysis, so as to overcome shortcomings of: (i) non-robust methods such as the least sum of squares; (ii) the need of the user in defining a trimmed sub-set of experimental points such as in least trimmed sum of squares; and (iii) the possibility to read the data set only once to evaluate the mean, variance, and outliers of a population by preserving robustness. (C) 2010 Published by Elsevier Ltd.
引用
收藏
页码:388 / 390
页数:3
相关论文
共 50 条
  • [1] Outlier Detection Forest for Large-Scale Categorical Data Sets
    Sun, Zhipeng
    Du, Hongwei
    Ye, Qiang
    Liu, Chuang
    Kibenge, Patricia Lilian
    Huang, Hui
    Li, Yuying
    [J]. COMPUTATIONAL DATA AND SOCIAL NETWORKS, 2019, 11917 : 45 - 56
  • [2] Efficient biased sampling for approximate clustering and outlier detection in large data sets
    Kollios, G
    Gunopulos, D
    Koudas, N
    Berchtold, S
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2003, 15 (05) : 1170 - 1187
  • [3] Grid-Based Outlier Detection in Large Data Sets for Combine Harvesters
    Gu, Ying
    Ganesan, Ram Kumar
    Bischke, Benjamin
    Bernardi, Ansgar
    Maier, Alexander
    Warkentin, Heinrich
    Steckel, Thilo
    Dengel, Andreas
    [J]. 2017 IEEE 15TH INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS (INDIN), 2017, : 811 - 818
  • [4] Outlier Detection by Regression Diagnostics in Large Data
    Nurunnabi, A. A. M.
    Nasser, Mohammed
    [J]. INTERNATIONAL CONFERENCE ON FUTURE COMPUTER AND COMMUNICATIONS, PROCEEDINGS, 2009, : 246 - +
  • [5] Outlier mining in large high-dimensional data sets
    Angiulli, F
    Pizzuti, C
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (02) : 203 - 215
  • [6] Fast outlier detection for very large log data
    Kim, Seung
    Cho, Nam Wook
    Kang, Bokyoung
    Kang, Suk-Ho
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (08) : 9587 - 9596
  • [7] Outlier detection for multinomial data with a large number of categories
    Yang, Xiaona
    Wang, Zhaojun
    Zi, Xuemin
    [J]. RANDOM MATRICES-THEORY AND APPLICATIONS, 2020, 9 (03)
  • [8] Fast Distributed Outlier Detection in Mixed-Attribute Data Sets
    Matthew Eric Otey
    Amol Ghoting
    Srinivasan Parthasarathy
    [J]. Data Mining and Knowledge Discovery, 2006, 12 : 203 - 228
  • [9] USING DETECTION PERFORMANCE TO ASSESS OUTLIER SIZING ON TRUNCATED DATA SETS
    Skow, Jason
    Krynicki, Joseph W.
    [J]. PROCEEDINGS OF ASME 2023 PRESSURE VESSELS & PIPING CONFERENCE, PVP2023, VOL 7, 2023,
  • [10] Fast distributed outlier detection in mixed-attribute data sets
    Otey, ME
    Ghoting, A
    Parthasarathy, S
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2006, 12 (2-3) : 203 - 228