Automated outlier detection and estimation of missing data

被引:2
|
作者
Rhyu, Jinwook [1 ]
Bozinovski, Dragana [2 ]
Dubs, Alexis B. [1 ]
Mohan, Naresh [2 ]
Bende, Elizabeth M. Cummings [2 ]
Maloney, Andrew J. [1 ]
Nieves, Miriam [2 ]
Sangerman, Jose
Lu, Amos E. [1 ]
Hong, Moo Sun [1 ]
Artamonova, Anastasia [2 ]
Ou, Rui Wen [3 ]
Barone, Paul W.
Leung, James C. [2 ]
Wolfrum, Jacqueline M.
Sinskey, Anthony J. [2 ,3 ]
Springs, Stacy L.
Braatz, Richard D. [1 ,2 ,4 ]
机构
[1] MIT, Dept Chem Engn, Cambridge, MA 02139 USA
[2] MIT, Ctr Biomed Innovat, 77 Massachusetts Ave, Cambridge, MA 02139 USA
[3] MIT, Dept Biol, 77 Massachusetts Ave, Cambridge, MA 02139 USA
[4] MIT, Dept Chem Engn, 77 Massachusetts Ave,Room E19-551, Cambridge, MA 02139 USA
关键词
Multivariate statistics; Statistical quality control; Principal component analysis; Otlier detection; Biomanufacturing; PRINCIPAL COMPONENT ANALYSIS; ROBUST PCA; MATRIX COMPLETION; MOTION CORRECTION; IDENTIFICATION; VALUES; PERSPECTIVES; NUMBER;
D O I
10.1016/j.compchemeng.2023.108448
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The majority of algorithms used for data imputation are based on latent variable methods. The presence of outliers in process data, however, misleads the latent relations among variables, resulting in an inaccurate estimation of missing values. This article proposes an approach for automatically detecting outliers using T2 and Q contributions and estimating missing data using various general-purpose algorithms while reducing the impact of outliers. The software is validated using biomanufacturing data from the production of a monoclonal antibody produced by Chinese hamster ovary cells in a perfusion bioreactor for five missingness cases including missing completely at random, sensor drop-out, multi-rate, patterned, and censoring. Based on the normalized root mean squared error and the three proposed metrics corresponding to feasibility, plausibility, and rapidity, respectively, matrix completion methods are the most effective, except for the censoring case in which probabilistic principal component analysis-based methods are the most effective.
引用
收藏
页数:39
相关论文
共 50 条
  • [1] Model-based clustering and outlier detection with missing data
    Tong, Hung
    Tortora, Cristina
    [J]. ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2022, 16 (01) : 5 - 30
  • [2] Automated weighted outlier detection technique for multivariate data
    Thennadil, Suresh N.
    Dewar, Mark
    Herdsman, Craig
    Nordon, Alison
    Becker, Edo
    [J]. CONTROL ENGINEERING PRACTICE, 2018, 70 : 40 - 49
  • [3] Model-based clustering and outlier detection with missing data
    Hung Tong
    Cristina Tortora
    [J]. Advances in Data Analysis and Classification, 2022, 16 : 5 - 30
  • [4] Outlier detection and missing data filling methods for coastal water temperature data
    Cho, Hong Yeon
    Oh, Ji Hee
    Kim, Kyeong Ok
    Shim, Jae Seol
    [J]. JOURNAL OF COASTAL RESEARCH, 2013, : 1898 - 1903
  • [5] OUTLIER DETECTION AND TRIMMED ESTIMATION FOR GENERAL FUNCTIONAL DATA
    Gervini, Daniel
    [J]. STATISTICA SINICA, 2012, 22 (04) : 1639 - 1660
  • [6] Partial mixture estimation and outlier detection in data and regression
    Scott, DW
    [J]. THEORY AND APPLICATION OF RECENT ROBUST METHODS, 2004, : 297 - 306
  • [7] Robust probabilistic PCA with missing data and contribution analysis for outlier detection
    Chen, Tao
    Martin, Elaine
    Montague, Gary
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2009, 53 (10) : 3706 - 3716
  • [8] A functional data approach to missing value imputation and outlier detection for traffic flow data
    Chiou, Jeng-Min
    Zhang, Yi-Chen
    Chen, Wan-Hui
    Chang, Chiung-Wen
    [J]. TRANSPORTMETRICA B-TRANSPORT DYNAMICS, 2014, 2 (02) : 106 - 129
  • [9] Outlier detection in networks with missing links
    Gaucher, Solenne
    Klopp, Olga
    Robin, Genevieve
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2021, 164
  • [10] Missing Data Probability Estimation-Based Bayesian Outlier Detection for Plant-Wide Processes with Multisampling Rates
    Tian, Ying
    Yin, Zhong
    Huang, Miao
    [J]. SYMMETRY-BASEL, 2018, 10 (10):