Multiple imputation of incomplete multilevel data using Heckman selection models

被引:1
|
作者
Munoz, Johanna [1 ,7 ]
Efthimiou, Orestis [2 ,3 ]
Audigier, Vincent [4 ]
de Jong, Valentijn M. T. [1 ,5 ]
Debray, Thomas P. A. [1 ,6 ]
机构
[1] Univ Utrecht, Univ Med Ctr Utrecht, Julius Ctr Hlth Sci & Primary Care, Utrecht, Netherlands
[2] Univ Bern, Inst Primary Hlth Care BIHAM, Bern, Switzerland
[3] Univ Bern, Inst Social & Prevent Med ISPM, Bern, Switzerland
[4] Lab CEDR MSDMA, Conservatoire Natl Arts & Metiers CNAM, Paris, France
[5] European Med Agcy, Data Analyt & Methods Task Force, Amsterdam, Netherlands
[6] Smart Data Anal & Stat, Utrecht, Netherlands
[7] UMC Utrecht, Julius Ctr Hlth Sci & Primary Care, Str 6-131,POB 85500, NL-3508GA Utrecht, Netherlands
基金
欧盟地平线“2020”;
关键词
Heckman model; IPDMA; missing not at random; selection models; multiple imputation; SAMPLE SELECTION; VARIABLES; BIAS;
D O I
10.1002/sim.9965
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Missing data is a common problem in medical research, and is commonly addressed using multiple imputation. Although traditional imputation methods allow for valid statistical inference when data are missing at random (MAR), their implementation is problematic when the presence of missingness depends on unobserved variables, that is, the data are missing not at random (MNAR). Unfortunately, this MNAR situation is rather common, in observational studies, registries and other sources of real-world data. While several imputation methods have been proposed for addressing individual studies when data are MNAR, their application and validity in large datasets with multilevel structure remains unclear. We therefore explored the consequence of MNAR data in hierarchical data in-depth, and proposed a novel multilevel imputation method for common missing patterns in clustered datasets. This method is based on the principles of Heckman selection models and adopts a two-stage meta-analysis approach to impute binary and continuous variables that may be outcomes or predictors and that are systematically or sporadically missing. After evaluating the proposed imputation model in simulated scenarios, we illustrate it use in a cross-sectional community survey to estimate the prevalence of malaria parasitemia in children aged 2-10 years in five regions in Uganda.
引用
收藏
页码:514 / 533
页数:20
相关论文
共 50 条
  • [21] A functional multiple imputation approach to incomplete longitudinal data
    He, Yulei
    Yucel, Recai
    Raghunathan, Trivellore E.
    [J]. STATISTICS IN MEDICINE, 2011, 30 (10) : 1137 - 1156
  • [22] Gaussian Graphical Model Estimation and Selection for High-Dimensional Incomplete Data Using Multiple Imputation and Horseshoe Estimators
    Zhang, Yunxi
    Kim, Soeun
    [J]. MATHEMATICS, 2024, 12 (12)
  • [23] Multiple Imputation for Multilevel Data with Continuous and Binary Variables
    Audigier, Vincent
    White, Ian R.
    Jolani, Shahab
    Debray, Thomas P. A.
    Quartagno, Matteo
    Carpenter, James
    van Buuren, Stef
    Resche-Rigon, Matthieu
    [J]. STATISTICAL SCIENCE, 2018, 33 (02) : 160 - 183
  • [24] Multiple imputation of binary multilevel missing not at random data
    Hammon, Angelina
    Zinn, Sabine
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES C-APPLIED STATISTICS, 2020, 69 (03) : 547 - 564
  • [25] Multiple imputation for analysis of incomplete data in distributed health data networks
    Changgee Chang
    Yi Deng
    Xiaoqian Jiang
    Qi Long
    [J]. Nature Communications, 11
  • [26] Multiple imputation for analysis of incomplete data in distributed health data networks
    Chang, Changgee
    Deng, Yi
    Jiang, Xiaoqian
    Long, Qi
    [J]. NATURE COMMUNICATIONS, 2020, 11 (01)
  • [27] Multiple Imputation by Generative Adversarial Networks for Classification with Incomplete Data
    Bao Ngoc Vi
    Dinh Tan Nguyen
    Cao Truong Tran
    Huu Phuc Ngo
    Chi Cong Nguyen
    Hai-Hong Phan
    [J]. 2021 RIVF INTERNATIONAL CONFERENCE ON COMPUTING AND COMMUNICATION TECHNOLOGIES (RIVF 2021), 2021, : 162 - 167
  • [28] Multiple imputation of incomplete zero-inflated count data
    Kleinke, Kristian
    Reinecke, Jost
    [J]. STATISTICA NEERLANDICA, 2013, 67 (03) : 311 - 336
  • [29] A comparison of multiple imputation methods for incomplete longitudinal binary data
    Yamaguchi, Yusuke
    Misumi, Toshihiro
    Maruo, Kazushi
    [J]. JOURNAL OF BIOPHARMACEUTICAL STATISTICS, 2018, 28 (04) : 645 - 667
  • [30] Multiple Imputation for Longitudinal Data Under a Bayesian Multilevel Model
    Demirtas, Hakan
    [J]. COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2009, 38 (16-17) : 2812 - 2828