SIMULTANEOUS EDIT AND IMPUTATION FOR HOUSEHOLD DATA WITH STRUCTURAL ZEROS

被引:0
|
作者
Akande, Olanrewaju [1 ]
Barrientos, Andres [1 ]
Reiter, Jerome P. [2 ]
机构
[1] Duke Univ, Dept Stat Sci, POB 90251, Durham, NC 27708 USA
[2] Duke Univ, Stat Sci, Durham, NC 27708 USA
基金
美国国家科学基金会;
关键词
Categorical; Census; Latent; Measurement error; Missing; Mixture; DISCLOSURE LIMITATION;
D O I
10.1093/jssam/smy022
中图分类号
O1 [数学]; C [社会科学总论];
学科分类号
03 ; 0303 ; 0701 ; 070101 ;
摘要
Multivariate categorical data nested within households often include reported values that fail edit constraints-for example, a participating household reports a child's age as older than his biological parent's age-and have missing values. Generally, agencies prefer datasets to be free from erroneous or missing values before analyzing them or disseminating them to secondary data users. We present a model-based engine for editing and imputation of household data based on a Bayesian hierarchical model that includes (i) a nested data Dirichlet process mixture of products of multinomial distributions as the model for the true latent values of the data, truncated to allow only households that satisfy all edit constraints, (ii) a model for the location of errors, and (iii) a reporting model for the observed responses in error. The approach propagates uncertainty due to unknown locations of errors and missing values, generates plausible datasets that satisfy all edit constraints, and can preserve multivariate relationships within and across individuals in the same household. We illustrate the approach using data from the 2012 American Community Survey.
引用
收藏
页码:498 / 519
页数:22
相关论文
共 50 条
  • [1] Multiple imputation of missing values in household data with structural zeros
    Akande, Olanrewaju
    Reiter, Jerome
    Barrientos, Andres F.
    [J]. SURVEY METHODOLOGY, 2019, 45 (02) : 271 - 294
  • [2] Bayesian Simultaneous Edit and Imputation for Multivariate Categorical Data
    Manrique-Vallier, Daniel
    Reiter, Jerome P.
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2017, 112 (520) : 1708 - 1719
  • [3] Simultaneous edit-imputation and disclosure limitation for business establishment data
    Kim, Hang J.
    Reiter, Jerome P.
    Karr, Alan F.
    [J]. JOURNAL OF APPLIED STATISTICS, 2018, 45 (01) : 63 - 82
  • [4] Simultaneous Edit-Imputation for Continuous Microdata
    Kim, Hang J.
    Cox, Lawrence H.
    Karr, Alan F.
    Reiter, Jerome P.
    Wang, Quanli
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2015, 110 (511) : 987 - 999
  • [5] A MODEL FOR GENERALIZED EDIT AND IMPUTATION OF SURVEY DATA
    GILES, P
    [J]. CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 1988, 16 : 57 - 73
  • [6] Bayesian multiple imputation for large-scale categorical data with structural zeros
    Manrique-Vallier, Daniel
    Reiter, Jerome P.
    [J]. SURVEY METHODOLOGY, 2014, 40 (01) : 125 - 134
  • [7] Imputation of numerical data under linear edit restrictions
    Coutinho, Wieger
    de Waal, Ton
    Remmerswaal, Marco
    [J]. SORT-STATISTICS AND OPERATIONS RESEARCH TRANSACTIONS, 2011, 35 (01) : 39 - 62
  • [8] Multiple edit/multiple imputation for multivariate continuous data
    Ghosh-Dastidar, B
    Schafer, JL
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2003, 98 (464) : 807 - 817
  • [9] CALIBRATED IMPUTATION OF NUMERICAL DATA UNDER LINEAR EDIT RESTRICTIONS
    Pannekoek, Jeroen
    Shlomo, Natalie
    De Waal, Ton
    [J]. ANNALS OF APPLIED STATISTICS, 2013, 7 (04): : 1983 - 2006
  • [10] Imputation of rounded zeros for high-dimensional compositional data
    Templ, Matthias
    Hron, Karel
    Filzmoser, Peter
    Gardlo, Alzbeta
    [J]. CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2016, 155 : 183 - 190