Don't Do Imputation: Dealing with Informative Missing Values in EHR Data Analysis

被引:8
|
作者
Li, Jia [1 ]
Wang, Mengdie [1 ]
Steinbach, Michael S. [1 ]
Kumar, Vipin [1 ]
Simon, Gyorgy J. [2 ]
机构
[1] Univ Minnesota Twin Cities, Dept Comp Sci & Engn, Minneapolis, MN 55455 USA
[2] Univ Minnesota Twin Cities, Inst Hlth Informat, Minneapolis, MN USA
关键词
Missing Value; Missing Not At Random; Informative Missing; Pattern-Wise Learning; No Imputation; PATTERN-MIXTURE MODELS; STRATEGIES;
D O I
10.1109/ICBK.2018.00062
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Missing values pose a significant challenge in data analytic, especially in clinical studies, data is typically missing-not-at-random (MNAR). Applying techniques (e.g. imputations) that were designed for missing-at-random (MAR) to MNAR data, can lead to biases. In this work, we propose pattern-wise analysis, a collection of methods for building predictive models in the presence of MNAR missing values. On a per-pattern basis, this methodology constructs an individual model for each missingness pattern. We show that even the simplest pattern-wise method, Per-Pattern Modeling (PPM) outperforms models built on data sets completed by the most popular imputation methods. PPM faces difficulty when the number of missingness patterns is too high or when the missingness patterns have too few observations. We developed variants of PPM to overcome these challenges from three complementary perspectives: (i) from a model selection perspective, where PPM can select patterns to build models; (ii) a distributional perspective, where the training data set is expanded in a distribution-preserving fashion; and (iii) from a causal perspective, where a causal structure for the MNAR mechanism is assumed and exploited to convert the problem from MNAR to MAR. Evaluation of the proposed methods on both synthetic MNAR data and a real-world clinical data set of sepsis patients shows notable improvement over traditional approaches.
引用
收藏
页码:415 / 422
页数:8
相关论文
共 50 条
  • [1] Multiple imputation: dealing with missing data
    de Goeij, Moniek C. M.
    van Diepen, Merel
    Jager, Kitty J.
    Tripepi, Giovanni
    Zoccali, Carmine
    Dekker, Friedo W.
    [J]. NEPHROLOGY DIALYSIS TRANSPLANTATION, 2013, 28 (10) : 2415 - 2420
  • [2] Dealing with Zeros and Missing Values in Compositional Data Sets Using Nonparametric Imputation
    J. A. Martín-Fernández
    C. Barceló-Vidal
    V. Pawlowsky-Glahn
    [J]. Mathematical Geology, 2003, 35 : 253 - 278
  • [3] Dealing with zeros and missing values in compositional data sets using nonparametric imputation
    Martín-Fernández, JA
    Barceló-Vidal, C
    Pawlowsky-Glahn, V
    [J]. MATHEMATICAL GEOLOGY, 2003, 35 (03): : 253 - 278
  • [4] Treatment of missing values with imputation for the analysis of otologic data
    Laurikkala, J
    Kentala, E
    Juhola, M
    Pyykkö, I
    [J]. MEDICAL INFORMATICS EUROPE '99, 1999, 68 : 428 - 431
  • [5] Introduction to multiple imputation for dealing with missing data
    Lee, Katherine J.
    Simpson, Julie A.
    [J]. RESPIROLOGY, 2014, 19 (02) : 162 - 167
  • [6] Dealing with missing values in large-scale studies: microarray data imputation and beyond
    Aittokallio, Tero
    [J]. BRIEFINGS IN BIOINFORMATICS, 2010, 11 (02) : 253 - 264
  • [7] Multiple imputation: a mature approach to dealing with missing data
    Chevret, S.
    Seaman, S.
    Resche-Rigon, M.
    [J]. INTENSIVE CARE MEDICINE, 2015, 41 (02) : 348 - 350
  • [8] Multiple imputation: a mature approach to dealing with missing data
    S. Chevret
    S. Seaman
    M. Resche-Rigon
    [J]. Intensive Care Medicine, 2015, 41 : 348 - 350
  • [9] Multiple Imputation Ensembles (MIE) for Dealing with Missing Data
    Aleryani A.
    Wang W.
    de la Iglesia B.
    [J]. SN Computer Science, 2020, 1 (3)
  • [10] Dealing with Missing Values in Microarray Data
    Mohammadi, Azadeh
    Saraee, Mohammad Hossein
    [J]. 2008 INTERNATIONAL CONFERENCE ON EMERGING TECHNOLOGIES, PROCEEDINGS, 2008, : 258 - 263