MULTIPLE IMPUTATION OF INCOMPLETE CATEGORICAL DATA USING LATENT CLASS ANALYSIS

被引:74
|
作者
Vermunt, Jeroen K. [1 ]
van Ginkel, Joost R. [2 ]
van der Ark, L. Andries [1 ]
Sijtsma, Klaas [1 ]
机构
[1] Tilburg Univ, Tilburg, Netherlands
[2] Leiden Univ, NL-2300 RA Leiden, Netherlands
来源
关键词
D O I
10.1111/j.1467-9531.2008.00202.x
中图分类号
C91 [社会学];
学科分类号
030301 ; 1204 ;
摘要
We propose using latent class analysis as an alternative to log-linear analysis for the multiple imputation of incomplete categorical data. Similar to log-linear models, latent class models can be used to describe complex association structures between the variables used in the imputation model. However, unlike log-linear models, latent class models can be used to build large imputation models containing more than a few categorical variables. To obtain imputations reflecting uncertainty about the unknown model parameters, we use a nonparametric bootstrap procedure as an alternative to the more common full Bayesian approach. The proposed multiple imputation method, which is implemented in Latent GOLD software for latent class analysis, is illustrated with two examples. fit a simulated data example, we compare the new method to well-established methods such as maximum likelihood estimation with incomplete data and multiple imputation using a saturated log-linear model. This example shows that the proposed method yields unbiased parameter estimates and standard errors. The second example concerns an application using a typical social sciences data set. It contains 79 variables that are all included in the imputation model. The proposed method in especially useful for such large data sets because standard methods for dealing with missing data in categorical variables break down when the number of variables is so large.
引用
收藏
页码:369 / 397
页数:29
相关论文
共 50 条
  • [41] Multiple Imputation for Incomplete Data in Environmental Epidemiology Research
    Allotey, Prince Addo
    Harel, Ofer
    [J]. CURRENT ENVIRONMENTAL HEALTH REPORTS, 2019, 6 (02) : 62 - 71
  • [42] Multivariable data imputation for the analysis of incomplete credit data
    Lan, Qiujun
    Xu, Xuqing
    Ma, Haojie
    Li, Gang
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2020, 141 (141)
  • [43] Advancing Multiple Imputation for Latent Profile Analysis
    Waldman, Marcus R.
    [J]. MULTIVARIATE BEHAVIORAL RESEARCH, 2019, 54 (01) : 157 - 158
  • [44] Usefulness of imputation for the analysis of incomplete otoneurologic data
    Laurikkala, J
    Kentala, E
    Juhola, M
    Pyykkö, I
    Lammi, S
    [J]. INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2000, 58 : 235 - 242
  • [45] Incomplete categorical data analysis: A Bayesian perspective
    Soares, P
    Paulino, CD
    [J]. JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2001, 69 (02) : 157 - 170
  • [46] Latent class analysis of incomplete data via an entropy-based criterion
    Larose, Chantal
    Harel, Ofer
    Kordas, Katarzyna
    Dey, Dipak K.
    [J]. STATISTICAL METHODOLOGY, 2016, 32 : 107 - 121
  • [47] Multiple Imputation by Generative Adversarial Networks for Classification with Incomplete Data
    Bao Ngoc Vi
    Dinh Tan Nguyen
    Cao Truong Tran
    Huu Phuc Ngo
    Chi Cong Nguyen
    Hai-Hong Phan
    [J]. 2021 RIVF INTERNATIONAL CONFERENCE ON COMPUTING AND COMMUNICATION TECHNOLOGIES (RIVF 2021), 2021, : 162 - 167
  • [48] Multiple Imputation with Principal Components for Non-Normal Categorical Data
    Kim, Youngmin
    Lee, Jaehoon
    Little, Todd D.
    [J]. MULTIVARIATE BEHAVIORAL RESEARCH, 2020, 56 (01) : 165 - 166
  • [49] Nonparametric Bayesian Multiple Imputation for Incomplete Categorical Variables in Large-Scale Assessment Surveys
    Si, Yajuan
    Reiter, Jerome P.
    [J]. JOURNAL OF EDUCATIONAL AND BEHAVIORAL STATISTICS, 2013, 38 (05) : 499 - 521
  • [50] Multiple imputation for categorical time series
    Halpin, Brendan
    [J]. STATA JOURNAL, 2016, 16 (03): : 590 - 612