A heuristic approach to handling missing data in biologics manufacturing databases

被引:8
|
作者
Mante, Jeanet [1 ]
Gangadharan, Nishanthi [2 ]
Sewell, David J. [2 ]
Turner, Richard [3 ]
Field, Ray [3 ]
Oliver, Stephen G. [4 ,5 ]
Slater, Nigel [2 ]
Dikicioglu, Duygu [2 ,4 ]
机构
[1] Pembroke Coll, Cambridge, England
[2] Univ Cambridge, Dept Chem Engn & Biotechnol, Cambridge, England
[3] MedImmune, Biopharmaceut Dev, Cell Sci, Cambridge, England
[4] Univ Cambridge, Cambridge Syst Biol Ctr, Cambridge, England
[5] Univ Cambridge, Dept Biochem, Cambridge, England
基金
英国生物技术与生命科学研究理事会;
关键词
Biologics manufacturing data; Missing data; Imputation; Parameter recurrence; Data pre-processing; MULTIPLE IMPUTATION;
D O I
10.1007/s00449-018-02059-5
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
The biologics sector has amassed a wealth of data in the past three decades, in line with the bioprocess development and manufacturing guidelines, and analysis of these data with precision is expected to reveal behavioural patterns in cell populations that can be used for making predictions on how future culture processes might behave. The historical bioprocessing data likely comprise experiments conducted using different cell lines, to produce different products and may be years apart; the situation causing inter-batch variability and missing data points to human- and instrument-associated technical oversights. These unavoidable complications necessitate the introduction of a pre-processing step prior to data mining. This study investigated the efficiency of mean imputation and multivariate regression for filling in the missing information in historical bio-manufacturing datasets, and evaluated their performance by symbolic regression models and Bayesian non-parametric models in subsequent data processing. Mean substitution was shown to be a simple and efficient imputation method for relatively smooth, non-dynamical datasets, and regression imputation was effective whilst maintaining the existing standard deviation and shape of the distribution in dynamical datasets with less than 30% missing data. The nature of the missing information, whether Missing Completely At Random, Missing At Random or Missing Not At Random, emerged as the key feature for selecting the imputation method.
引用
收藏
页码:657 / 663
页数:7
相关论文
共 50 条
  • [1] A heuristic approach to handling missing data in biologics manufacturing databases
    Jeanet Mante
    Nishanthi Gangadharan
    David J. Sewell
    Richard Turner
    Ray Field
    Stephen G. Oliver
    Nigel Slater
    Duygu Dikicioglu
    [J]. Bioprocess and Biosystems Engineering, 2019, 42 : 657 - 663
  • [2] Comparative methods for handling missing data in large databases
    Henry, Antonia J.
    Hevelone, Nathanael D.
    Lipsitz, Stuart
    Nguyen, Louis L.
    [J]. JOURNAL OF VASCULAR SURGERY, 2013, 58 (05) : 1353 - +
  • [3] Handling of missing data to improve the mining of large feed databases
    Maroto-Molina, F.
    Gomez-Cabrera, A.
    Guerrero-Ginel, J. E.
    Garrido-Varo, A.
    Sauvant, D.
    Tran, G.
    Heuze, V.
    Perez-Marin, D. C.
    [J]. JOURNAL OF ANIMAL SCIENCE, 2013, 91 (01) : 491 - 500
  • [4] Cleaning Disguised Missing Data: A Heuristic Approach
    Hua, Ming
    Pei, Jian
    [J]. KDD-2007 PROCEEDINGS OF THE THIRTEENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2007, : 950 - 958
  • [5] A hierarchical Bayesian approach for handling missing classification data
    Ketz, Alison C.
    Johnson, Therese L.
    Hooten, Mevin B.
    Hobbs, N. Thompson
    [J]. ECOLOGY AND EVOLUTION, 2019, 9 (06): : 3130 - 3140
  • [6] An IS Approach for Handling Missing Data in Collaborative Medical Research
    Meiller, Yannick
    [J]. AMCIS 2016 PROCEEDINGS, 2016,
  • [7] Missing Data Treatment in Crash Data: A Heuristic Optimization Weighting Approach
    Asgharpour, Sina
    Javadinasr, Mohammadjavad
    Mohammadian, Ryan
    Mohammadian, Abolfazl
    [J]. INTERNATIONAL CONFERENCE ON TRANSPORTATION AND DEVELOPMENT 2023: TRANSPORTATION SAFETY AND EMERGING TECHNOLOGIES, 2023, : 87 - 98
  • [8] Handling of Missing Data
    Budhiraja, Pooja
    Kaplan, Bruce
    Mustafa, Reem A.
    [J]. TRANSPLANTATION, 2020, 104 (01) : 24 - 26
  • [9] HANDLING OF MISSING DATA
    Torres, F.
    [J]. BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2011, 109 : 17 - 17
  • [10] Handling missing data
    不详
    [J]. CURRENT PROBLEMS IN CANCER, 2005, 29 (06) : 317 - 325