Imputation of Missing Data in Industrial Databases

被引:0
|
作者
Kamakshi Lakshminarayan
Steven A. Harp
Tariq Samad
机构
[1] Honeywell Technology Center,
来源
Applied Intelligence | 1999年 / 11卷
关键词
missing data; industrial databases; multiple imputation; machine learning;
D O I
暂无
中图分类号
学科分类号
摘要
A limiting factor for the application of IDA methods in many domains is the incompleteness of data repositories. Many records have fields that are not filled in, especially, when data entry is manual. In addition, a significant fraction of the entries can be erroneous and there may be no alternative but to discard these records. But every cell in a database is not an independent datum. Statistical relationships will constrain and, often determine, missing values. Data imputation, the filling in of missing values for partially missing data, can thus be an invaluable first step in many IDA projects. New imputation methods that can handle the large-scale problems and large-scale sparsity of industrial databases are needed. To illustrate the incomplete database problem, we analyze one database with instrumentation maintenance and test records for an industrial process. Despite regulatory requirements for process data collection, this database is less than 50% complete. Next, we discuss possible solutions to the missing data problem. Several approaches to imputation are noted and classified into two categories: data-driven and model-based. We then describe two machine-learning-based approaches that we have worked with. These build upon well-known algorithms: AutoClass and C4.5. Several experiments are designed, all using the maintenance database as a common test-bed but with various data splits and algorithmic variations. Results are generally positive with up to 80% accuracies of imputation. We conclude the paper by outlining some considerations in selecting imputation methods, and by discussing applications of data imputation for intelligent data analysis.
引用
收藏
页码:259 / 275
页数:16
相关论文
共 50 条
  • [1] Imputation of missing data in industrial databases
    Lakshminarayan, K
    Harp, SA
    Samad, T
    APPLIED INTELLIGENCE, 1999, 11 (03) : 259 - 275
  • [2] IMPUTATION OF MISSING DATA
    Lunt, M.
    ANNALS OF THE RHEUMATIC DISEASES, 2014, 73 : 49 - 49
  • [3] Missing data imputation and corrected statistics for large-scale behavioral databases
    Pierre Courrieu
    Arnaud Rey
    Behavior Research Methods, 2011, 43 : 310 - 330
  • [4] Missing data imputation and corrected statistics for large-scale behavioral databases
    Courrieu, Pierre
    Rey, Arnaud
    BEHAVIOR RESEARCH METHODS, 2011, 43 (02) : 310 - 330
  • [5] A novel framework for imputation of missing values in databases
    Farhangfar, Alireza
    Kurgan, Lukasz A.
    Pedrycz, Witold
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART A-SYSTEMS AND HUMANS, 2007, 37 (05): : 692 - 709
  • [6] Issues in multiple imputation of missing data for large general practice clinical databases
    Marston, Louise
    Carpenter, James R.
    Walters, Kate R.
    Morris, Richard W.
    Nazareth, Irwin
    Petersen, Irene
    PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2010, 19 (06) : 618 - 626
  • [7] MIGHT: Statistical Methodology for Missing-Data Imputation in Food Composition Databases
    Ispirova, Gordana
    Eftimov, Tome
    Korosec, Peter
    Seljak, Barbara Korousic
    APPLIED SCIENCES-BASEL, 2019, 9 (19):
  • [8] Missing Value Imputation for Industrial IoT Sensor Data With Large Gaps
    Liu, Yuehua
    Dillon, Tharam
    Yu, Wenjin
    Rahayu, Wenny
    Mostafa, Fahed
    IEEE INTERNET OF THINGS JOURNAL, 2020, 7 (08) : 6855 - 6867
  • [9] Missing data imputation: focusing on single imputation
    Zhang, Zhongheng
    ANNALS OF TRANSLATIONAL MEDICINE, 2016, 4 (01)
  • [10] Multiple Imputation for Missing Data in Electronic Health Databases: Practical Issues and Some Solutions
    Carpenter, James
    Petersen, Irene
    Welch, Catherine
    Bartlett, Jonathan
    Walters, Kate
    Morris, Richard
    White, Ian
    Marston, Louise
    Nazareth, Irwin
    PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2011, 20 : S163 - S163