Robust probabilistic PCA with missing data and contribution analysis for outlier detection

被引:95
|
作者
Chen, Tao [1 ]
Martin, Elaine [2 ]
Montague, Gary [2 ]
机构
[1] Nanyang Technol Univ, Sch Chem & Biomed Engn, Singapore 637459, Singapore
[2] Univ Newcastle, Sch Chem Engn & Adv Mat, Newcastle Upon Tyne NE1 7RU, Tyne & Wear, England
关键词
PRINCIPAL COMPONENTS; COVARIANCE; IDENTIFICATION; MATRIX;
D O I
10.1016/j.csda.2009.03.014
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Principal component analysis (PCA) is a widely adopted multivariate data analysis technique, with interpretation being established on the basis of both classical linear projection and a probability model (i.e. probabilistic PCA (PPCA)). Recently robust PPCA models, by using the multivariate t-distribution, have been proposed to consider the situation where there may be outliers within the data set. This paper presents an overview of the robust PPCA technique, and further discusses the issue of missing data. An expectation-maximization (EM) algorithm is presented for the maximum likelihood estimation of the model parameters in the presence of missing data. When applying robust PPCA for outlier detection, a contribution analysis method is proposed to identify which variables contribute the most to the occurrence of outliers, providing valuable information regarding the source of outlying data. The proposed technique is demonstrated on numerical examples, and the application to outlier detection and diagnosis in an industrial fermentation process. (C) 2009 Elsevier B.V. All rights reserved.
引用
收藏
页码:3706 / 3716
页数:11
相关论文
共 50 条
  • [41] Robust detrending, rereferencing, outlier detection, and inpainting for multichannel data
    de Cheveigne, Alain
    Arzounian, Dorothee
    NEUROIMAGE, 2018, 172 : 903 - 912
  • [42] Robust local outlier detection with statistical parameter for big data
    Lei, Jingsheng
    Jiang, Teng
    Wu, Kui
    Du, Haizhou
    Zhu, Lin
    COMPUTER SYSTEMS SCIENCE AND ENGINEERING, 2015, 30 (05): : 411 - 419
  • [43] Outlier detection in networks with missing links
    Gaucher, Solenne
    Klopp, Olga
    Robin, Genevieve
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2021, 164
  • [44] Learned Robust PCA: A Scalable Deep Unfolding Approach for High-Dimensional Outlier Detection
    Cai, HanQin
    Liu, Jialin
    Yin, Wotao
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [45] Probabilistic principal component analysis-based anomaly detection for structures with missing data
    Ma, Zhi
    Yun, Chung-Bang
    Wan, Hua-Ping
    Shen, Yanbin
    Yu, Feng
    Luo, Yaozhi
    STRUCTURAL CONTROL & HEALTH MONITORING, 2021, 28 (05):
  • [46] Robust statistics for outlier detection
    Rousseeuw, Peter J.
    Hubert, Mia
    WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2011, 1 (01) : 73 - 79
  • [47] PROBABILISTIC PCA FOR HETEROSCEDASTIC DATA
    Hong, David
    Balzano, Laura
    Fessler, Jeffrey A.
    2019 IEEE 8TH INTERNATIONAL WORKSHOP ON COMPUTATIONAL ADVANCES IN MULTI-SENSOR ADAPTIVE PROCESSING (CAMSAP 2019), 2019, : 26 - 30
  • [48] Robust support vector data description for outlier detection with noise or uncertain data
    Chen, Guijun
    Zhang, Xueying
    Wang, Zizhong John
    Li, Fenglian
    KNOWLEDGE-BASED SYSTEMS, 2015, 90 : 129 - 137
  • [49] Robust Local Outlier Detection
    Du, Haizhou
    Zhao, Shengjie
    Zhang, Daqiang
    2015 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOP (ICDMW), 2015, : 116 - 123
  • [50] Robust principal component analysis for accurate outlier sample detection in RNA-Seq data
    Xiaoying Chen
    Bo Zhang
    Ting Wang
    Azad Bonni
    Guoyan Zhao
    BMC Bioinformatics, 21