Multiple imputation for incomplete data with semicontinuous variables

被引:15
|
作者
Javaras, KN [1 ]
Van Dyk, DA
机构
[1] Univ Oxford, Dept Stat, Oxford OX1 3TG, England
[2] Univ Calif Irvine, Dept Stat, Irvine, CA 92697 USA
关键词
data augmentation; EM algorithm; general location model; missing data; survey data;
D O I
10.1198/016214503000000611
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We consider the application of multiple imputation to data containing not only partially missing categorical and continuous variables, but also partially missing 'semicontinuous' variables (variables that take on a single discrete value with positive probability but are otherwise continuously distributed). As an imputation model for data sets of this type, we introduce an extension of the standard general location model proposed by Olkin and Tate; our extension, the blocked general location model, provides a robust and general strategy for handling partially observed semicontinuous variables. In particular, we incorporate a two-level model for the semicontinuous variables into the general location model. The first level models the probability that the semicontinuous variable takes on its point mass value, and the second level models the distribution of the variable given that it is not at its point mass. In addition, we introduce EM and data augmentation algorithms for the blocked general location model with missing data; these can be used to generate imputations under the proposed model and have been implemented in publicly available software. We illustrate our model and computational methods via a simulation study and an analysis of a survey of Massachusetts Megabucks Lottery winners.
引用
收藏
页码:703 / 715
页数:13
相关论文
共 50 条
  • [1] Multiple imputation for the analysis of incomplete compound variables
    Zhao, Jiwei
    Cook, Richard J.
    Wu, Changbao
    [J]. CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2015, 43 (02): : 240 - 264
  • [2] Predictive mean matching imputation of semicontinuous variables
    Vink, Gerko
    Frank, Laurence E.
    Pannekoek, Jeroen
    van Buuren, Stef
    [J]. STATISTICA NEERLANDICA, 2014, 68 (01) : 61 - 90
  • [3] A multiple imputation strategy for incomplete longitudinal data
    Landrum, MB
    Becker, MP
    [J]. STATISTICS IN MEDICINE, 2001, 20 (17-18) : 2741 - 2760
  • [4] Multiple Imputation for Incomplete Data in Epidemiologic Studies
    Harel, Ofer
    Mitchell, Emily M.
    Perkins, Neil J.
    Cole, Stephen R.
    Tchetgen, Eric J. Tchetgen
    Sun, BaoLuo
    Schisterman, Enrique F.
    [J]. AMERICAN JOURNAL OF EPIDEMIOLOGY, 2018, 187 (03) : 576 - 584
  • [5] MULTIPLE IMPUTATION FOR CATEGORICAL VARIABLES IN MULTILEVEL DATA
    Kottage, Helani Dilshara
    [J]. BULLETIN OF THE AUSTRALIAN MATHEMATICAL SOCIETY, 2022, 106 (02) : 349 - 350
  • [6] Multiple Imputation and Genetic Programming for Classification with Incomplete Data
    Cao Truong Tran
    Zhang, Mengjie
    Andreae, Peter
    Xue, Bing
    [J]. PROCEEDINGS OF THE 2017 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE (GECCO'17), 2017, : 521 - 528
  • [7] Multiple Imputation and Ensemble Learning for Classification with Incomplete Data
    Cao Truong Tran
    Zhang, Mengjie
    Andreae, Peter
    Xue, Bing
    Lam Thu Bui
    [J]. INTELLIGENT AND EVOLUTIONARY SYSTEMS, IES 2016, 2017, 8 : 401 - 415
  • [8] Multiple Imputation for Incomplete Data in Environmental Epidemiology Research
    Prince Addo Allotey
    Ofer Harel
    [J]. Current Environmental Health Reports, 2019, 6 : 62 - 71
  • [9] Multiple Imputation for Incomplete Data in Environmental Epidemiology Research
    Allotey, Prince Addo
    Harel, Ofer
    [J]. CURRENT ENVIRONMENTAL HEALTH REPORTS, 2019, 6 (02) : 62 - 71
  • [10] A functional multiple imputation approach to incomplete longitudinal data
    He, Yulei
    Yucel, Recai
    Raghunathan, Trivellore E.
    [J]. STATISTICS IN MEDICINE, 2011, 30 (10) : 1137 - 1156