Exploring process data with the use of robust outlier detection algorithms

被引:140
|
作者
Chiang, LH [1 ]
Pell, RJ [1 ]
Seasholtz, MB [1 ]
机构
[1] Dow Chem Co USA, Analyt Sci Lab, Midland, MI 48667 USA
关键词
outliers; robust statistics; process data; data preprocessing; scaling methods; TENNESSEE EASTMAN PROBLEM; PRINCIPAL COMPONENTS;
D O I
10.1016/S0959-1524(02)00068-9
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
To implement on-line process monitoring techniques such as principal component analysis (PCA) or partial least squares (PLS), it is necessary to extract data associated with the normal operating conditions from the plant historical database for calibrating the models. One way to do this is to use robust outlier detection algorithms such as resampling by half-means (RHM), smallest half volume (SHV), or ellipsoidal multivariate trimming (MVT) in the off-line model building phase. While RHM and SHV are conceptually clear and statistically sound, the computational requirements are heavy. Closest distance to center (CDC) is proposed in this paper as an alternative for outlier detection. The use of Mahalanobis distance in the initial step of MVT for detecting outliers is known to be ineffective. To improve MVT, CDC is incorporated with MVT. The performance was evaluated relative to the goal of finding the best half of a data set. Data sets were derived from the Tennessee Eastman process (TEP) simulator. Comparable results were obtained for RHM, SHV, and CDC. Better performance was obtained when CDC is incorporated with MVT, compared to using CDC and MVT alone. All robust outlier detection algorithms outperformed the standard PCA algorithm. The effect of auto scaling, robust scaling and a new scaling approach called modified scaling were investigated. With the presence of multiple outliers, auto scaling was found to degrade the performance of all the robust techniques. Reasonable results were obtained with the use of robust scaling and modified scaling. (C) 2003 Elsevier Science Ltd. All rights reserved.
引用
收藏
页码:437 / 449
页数:13
相关论文
共 50 条
  • [1] Outlier Detection Algorithms in Data Mining
    Xi, Jingke
    2008 INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION TECHNOLOGY APPLICATION, VOL I, PROCEEDINGS, 2008, : 94 - 97
  • [2] A Survey of Outlier Detection Algorithms for Data Streams
    Tamboli, Jinita
    Shukla, Madhu
    PROCEEDINGS OF THE 10TH INDIACOM - 2016 3RD INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT, 2016, : 3535 - 3540
  • [3] Outlier detection algorithms in data mining systems
    Petrovskiy, MI
    PROGRAMMING AND COMPUTER SOFTWARE, 2003, 29 (04) : 228 - 237
  • [4] Outlier Detection Algorithms in Data Mining Systems
    M. I. Petrovskiy
    Programming and Computer Software, 2003, 29 : 228 - 237
  • [5] Outlier detection algorithms in data mining systems
    Petrovskij, M.I.
    Programmirovanie, 2003, 29 (04):
  • [6] A comparison of outlier detection algorithms for ITS data
    Chen, Shuyan
    Wang, Wei
    van Zuylen, Henk
    EXPERT SYSTEMS WITH APPLICATIONS, 2010, 37 (02) : 1169 - 1178
  • [7] Robust transformations and outlier detection with autocorrelated data
    Cerioli, A
    Riani, M
    FROM DATA AND INFORMATION ANALYSIS TO KNOWLEDGE ENGINEERING, 2006, : 262 - +
  • [8] Outlier detection in process plant data
    Chen, J
    Bandoni, A
    Romagnoli, JA
    COMPUTERS & CHEMICAL ENGINEERING, 1998, 22 (4-5) : 641 - 646
  • [9] RODD: Robust Outlier Detection in Data Cubes
    Kuhlmann, Lara
    Wilmes, Daniel
    Mueller, Emmanuel
    Pauly, Markus
    Horn, Daniel
    BIG DATA ANALYTICS AND KNOWLEDGE DISCOVERY, DAWAK 2023, 2023, 14148 : 325 - 339
  • [10] Outlier detection and robust regression for correlated data
    Yuen, Ka-Veng
    Ortiz, Gilberto A.
    COMPUTER METHODS IN APPLIED MECHANICS AND ENGINEERING, 2017, 313 : 632 - 646