A heuristic approach to handling missing data in biologics manufacturing databases

被引:8
|
作者
Mante, Jeanet [1 ]
Gangadharan, Nishanthi [2 ]
Sewell, David J. [2 ]
Turner, Richard [3 ]
Field, Ray [3 ]
Oliver, Stephen G. [4 ,5 ]
Slater, Nigel [2 ]
Dikicioglu, Duygu [2 ,4 ]
机构
[1] Pembroke Coll, Cambridge, England
[2] Univ Cambridge, Dept Chem Engn & Biotechnol, Cambridge, England
[3] MedImmune, Biopharmaceut Dev, Cell Sci, Cambridge, England
[4] Univ Cambridge, Cambridge Syst Biol Ctr, Cambridge, England
[5] Univ Cambridge, Dept Biochem, Cambridge, England
基金
英国生物技术与生命科学研究理事会;
关键词
Biologics manufacturing data; Missing data; Imputation; Parameter recurrence; Data pre-processing; MULTIPLE IMPUTATION;
D O I
10.1007/s00449-018-02059-5
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
The biologics sector has amassed a wealth of data in the past three decades, in line with the bioprocess development and manufacturing guidelines, and analysis of these data with precision is expected to reveal behavioural patterns in cell populations that can be used for making predictions on how future culture processes might behave. The historical bioprocessing data likely comprise experiments conducted using different cell lines, to produce different products and may be years apart; the situation causing inter-batch variability and missing data points to human- and instrument-associated technical oversights. These unavoidable complications necessitate the introduction of a pre-processing step prior to data mining. This study investigated the efficiency of mean imputation and multivariate regression for filling in the missing information in historical bio-manufacturing datasets, and evaluated their performance by symbolic regression models and Bayesian non-parametric models in subsequent data processing. Mean substitution was shown to be a simple and efficient imputation method for relatively smooth, non-dynamical datasets, and regression imputation was effective whilst maintaining the existing standard deviation and shape of the distribution in dynamical datasets with less than 30% missing data. The nature of the missing information, whether Missing Completely At Random, Missing At Random or Missing Not At Random, emerged as the key feature for selecting the imputation method.
引用
收藏
页码:657 / 663
页数:7
相关论文
共 50 条
  • [41] A Machine Learning Approach to Mental Disorder Prediction: Handling the Missing Data Challenge
    Mokheleli, Tsholofelo
    Bokaba, Tebogo
    Museba, Tinofirei
    Ntshingila, Nompumelelo
    [J]. EMERGING TECHNOLOGIES FOR DEVELOPING COUNTRIES, AFRICATEK 2023, 2024, 520 : 93 - 106
  • [42] Handling missing data in a rheumatoid arthritis registry using random forest approach
    Alsaber, Ahmad
    Al-Herz, Adeeba
    Pan, Jiazhu
    AL-Sultan, Ahmad T.
    Mishra, Divya
    [J]. INTERNATIONAL JOURNAL OF RHEUMATIC DISEASES, 2021, 24 (10) : 1282 - 1293
  • [43] A Game-Theoretic Rough Set Approach for Handling Missing Data in Clustering
    Azam, Nouman
    Afridi, Mohammad Khan
    Yao, JingTao
    [J]. RECENT TRENDS AND FUTURE TECHNOLOGY IN APPLIED INTELLIGENCE, IEA/AIE 2018, 2018, 10868 : 635 - 647
  • [44] A hybrid approach of handling missing data under different missing data mechanisms: VISIBLE 1 and VARSITY trials for ulcerative colitis
    Chen, Jingjing
    Hunter, Sharon
    Kisfalvi, Krisztina
    Lirio, Richard A.
    [J]. CONTEMPORARY CLINICAL TRIALS, 2021, 100
  • [45] Handling missing data from heteroskedastic and nonstationary data
    Nelwamondo, Fulufhelo V.
    Marwala, Tshilidzi
    [J]. ADVANCES IN NEURAL NETWORKS - ISNN 2007, PT 1, PROCEEDINGS, 2007, 4491 : 1293 - +
  • [46] A study of handling missing data methods for big data
    Ezzine, Imane
    Benhlima, Laila
    [J]. 2018 IEEE 5TH INTERNATIONAL CONGRESS ON INFORMATION SCIENCE AND TECHNOLOGY (IEEE CIST'18), 2018, : 498 - 501
  • [47] A HEURISTIC APPROACH TO THE FORMULATION OF MANUFACTURING STRATEGY
    BARKER, RC
    POWELL, NK
    [J]. INTERNATIONAL JOURNAL OF PRODUCTION RESEARCH, 1989, 27 (12) : 2041 - 2051
  • [48] Methods for interpolating missing data in aerobiological databases
    Picornell, A.
    Oteros, J.
    Ruiz-Mata, R.
    Recio, M.
    Trigo, M. M.
    Martinez-Bracero, M.
    Lara, B.
    Serrano-Garcia, A.
    Galan, C.
    Garcia-Mozo, H.
    Alcazar, P.
    Perez-Badia, R.
    Cabezudo, B.
    Romero-Morte, J.
    Rojo, J.
    [J]. ENVIRONMENTAL RESEARCH, 2021, 200
  • [49] Handling Missing Data in the Modeling of Intensive Longitudinal Data
    Ji, Linying
    Chow, Sy-Miin
    Schermerhom, Alice C.
    Jacobson, Nicholas C.
    Cummings, E. Mark
    [J]. STRUCTURAL EQUATION MODELING-A MULTIDISCIPLINARY JOURNAL, 2018, 25 (05) : 715 - 736
  • [50] Experience with data handling in large chemical databases
    Langerman, Neal
    [J]. ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2015, 250