Robust probabilistic PCA with missing data and contribution analysis for outlier detection

被引:95
|
作者
Chen, Tao [1 ]
Martin, Elaine [2 ]
Montague, Gary [2 ]
机构
[1] Nanyang Technol Univ, Sch Chem & Biomed Engn, Singapore 637459, Singapore
[2] Univ Newcastle, Sch Chem Engn & Adv Mat, Newcastle Upon Tyne NE1 7RU, Tyne & Wear, England
关键词
PRINCIPAL COMPONENTS; COVARIANCE; IDENTIFICATION; MATRIX;
D O I
10.1016/j.csda.2009.03.014
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Principal component analysis (PCA) is a widely adopted multivariate data analysis technique, with interpretation being established on the basis of both classical linear projection and a probability model (i.e. probabilistic PCA (PPCA)). Recently robust PPCA models, by using the multivariate t-distribution, have been proposed to consider the situation where there may be outliers within the data set. This paper presents an overview of the robust PPCA technique, and further discusses the issue of missing data. An expectation-maximization (EM) algorithm is presented for the maximum likelihood estimation of the model parameters in the presence of missing data. When applying robust PPCA for outlier detection, a contribution analysis method is proposed to identify which variables contribute the most to the occurrence of outliers, providing valuable information regarding the source of outlying data. The proposed technique is demonstrated on numerical examples, and the application to outlier detection and diagnosis in an industrial fermentation process. (C) 2009 Elsevier B.V. All rights reserved.
引用
收藏
页码:3706 / 3716
页数:11
相关论文
共 50 条
  • [31] Outlier Detection over Sliding Windows for Probabilistic Data Streams
    王斌
    杨晓春
    王国仁
    于戈
    JournalofComputerScience&Technology, 2010, 25 (03) : 389 - 400
  • [32] Sparse Kernel PCA for Outlier Detection
    Das, Rudrajit
    Golatkar, Aditya
    Awate, Suyash P.
    2018 17TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2018, : 152 - 157
  • [33] Outlier detection and missing data filling methods for coastal water temperature data
    Cho, Hong Yeon
    Oh, Ji Hee
    Kim, Kyeong Ok
    Shim, Jae Seol
    JOURNAL OF COASTAL RESEARCH, 2013, : 1898 - 1903
  • [34] ROBUST PCA VIA DICTIONARY BASED OUTLIER PURSUIT
    Li, Xingguo
    Ren, Jineng
    Rambhatla, Sirisha
    Xu, Yangyang
    Haupt, Jarvis
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4699 - 4703
  • [35] Stochastic and Private Nonconvex Outlier-Robust PCA
    Maunu, Tyler
    Yu, Chenyu
    Lerman, Gilad
    MATHEMATICAL AND SCIENTIFIC MACHINE LEARNING, VOL 190, 2022, 190
  • [36] Outlier Detection Algorithm Based on Robust Component Analysis
    Zheng Cha
    Ji Lixin
    Gao Chao
    Li Shaomei
    Wang Yanchuan
    THIRD INTERNATIONAL WORKSHOP ON PATTERN RECOGNITION, 2018, 10828
  • [37] Exploring process data with the use of robust outlier detection algorithms
    Chiang, LH
    Pell, RJ
    Seasholtz, MB
    JOURNAL OF PROCESS CONTROL, 2003, 13 (05) : 437 - 449
  • [38] Missing data in kernel PCA
    Sanguinetti, Guido
    Lawrence, Neil D.
    MACHINE LEARNING: ECML 2006, PROCEEDINGS, 2006, 4212 : 751 - 758
  • [39] BRIDGING CONVEX AND NONCONVEX OPTIMIZATION IN ROBUST PCA: NOISE, OUTLIERS AND MISSING DATA
    Chen, Yuxin
    Fan, Jianqing
    Ma, Cong
    Yan, Yuling
    ANNALS OF STATISTICS, 2021, 49 (05): : 2948 - 2971
  • [40] A functional data approach to missing value imputation and outlier detection for traffic flow data
    Chiou, Jeng-Min
    Zhang, Yi-Chen
    Chen, Wan-Hui
    Chang, Chiung-Wen
    TRANSPORTMETRICA B-TRANSPORT DYNAMICS, 2014, 2 (02) : 106 - 129