Bayesian Multiscale Multiple Imputation With Implications for Data Confidentiality

被引:23
|
作者
Holan, Scott H. [1 ]
Toth, Daniell [2 ]
Ferreira, Marco A. R. [1 ]
Karr, Alan F. [3 ]
机构
[1] Univ Missouri, Dept Stat, Columbia, MO 65211 USA
[2] US Bur Labor Stat, Off Survey Methods Res, Washington, DC 20212 USA
[3] Natl Inst Stat Sci, Res Triangle Pk, NC 27709 USA
基金
美国国家科学基金会;
关键词
Cell suppression; Disclosure; Dynamic linear models; Missing data; Multiscale modeling; QCEW; DISCLOSURE LIMITATION; CELL SUPPRESSION; TABULAR DATA; MICRODATA; MODELS; FRAMEWORK; PRIVACY; SYSTEMS; UTILITY; WORLD;
D O I
10.1198/jasa.2009.ap08629
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Many scientific, sociological, and economic applications present data that are collected on multiple scales of resolution. One particular form of multiscale data arises when data are aggregated across different scales both longitudinally and by economic sector. Frequently, such datasets experience missing observations in a manner that they can be accurately imputed, while respecting the constraints imposed by the multiscale nature of the data, using the method we propose known as Bayesian multiscale multiple imputation. Our approach couples dynamic linear models with a novel imputation step based on singular normal distribution theory. Although our method is of independent interest, one important implication of such methodology is its potential effect on confidential databases protected by means of cell suppression. In order to demonstrate the proposed methodology and to assess the effectiveness of disclosure practices in longitudinal databases, we conduct a large-scale empirical study using the U.S. Bureau of Labor Statistics Quarterly Census of Employment and Wages (QCEW). During the course of our empirical investigation it is determined that several of the predicted cells are within 1% accuracy, thus causing potential concerns for data confidentiality.
引用
收藏
页码:564 / 577
页数:14
相关论文
共 50 条
  • [31] Imputation of complex biological data for Bayesian network analyses
    Howey, Richard
    Cordell, Heather J.
    GENETIC EPIDEMIOLOGY, 2018, 42 (07) : 705 - 706
  • [32] Visual analysis for panel data imputation with Bayesian network
    Yeon, Hanbyul
    Seo, Seongbum
    Son, Hyesook
    Jang, Yun
    JOURNAL OF SUPERCOMPUTING, 2022, 78 (02): : 1759 - 1782
  • [33] Bayesian Simultaneous Edit and Imputation for Multivariate Categorical Data
    Manrique-Vallier, Daniel
    Reiter, Jerome P.
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2017, 112 (520) : 1708 - 1719
  • [34] Modern multiple imputation with functional data
    Rao, Aniruddha Rajendra
    Reimherr, Matthew
    STAT, 2021, 10 (01):
  • [35] Multiple imputation for nonignorable missing data
    Jongho Im
    Soeun Kim
    Journal of the Korean Statistical Society, 2017, 46 : 583 - 592
  • [36] Multiple imputation of missing marijuana data in the Fatality Analysis Reporting System using a Bayesian multilevel model
    Chen, Qixuan
    Williams, Sharifa Z.
    Liu, Yutao
    Chihuri, Stanford T.
    Li, Guohua
    ACCIDENT ANALYSIS AND PREVENTION, 2018, 120 : 262 - 269
  • [37] Imputation of Missing Data for Bayesian Network Analyses of Complex Biological Data
    Howey, Richard
    Cordell, Heather
    HUMAN HEREDITY, 2017, 83 (01) : 11 - 11
  • [38] Multiple edit/multiple imputation for multivariate continuous data
    Ghosh-Dastidar, B
    Schafer, JL
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2003, 98 (464) : 807 - 817
  • [39] Bayesian multiple imputation for missing multivariate longitudinal data from a Parkinson's disease clinical trial
    Luo, Sheng
    Lawson, Andrew B.
    He, Bo
    Elm, Jordan J.
    Tilley, Barbara C.
    STATISTICAL METHODS IN MEDICAL RESEARCH, 2016, 25 (02) : 821 - 837
  • [40] Nonparametric Bayesian Multiple Imputation for Missing Data Due to Mid-Study Switching of Measurement Methods
    Burgette, Lane F.
    Reiter, Jerome P.
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2012, 107 (498) : 439 - 449