Bayesian Multiscale Multiple Imputation With Implications for Data Confidentiality

被引:23
|
作者
Holan, Scott H. [1 ]
Toth, Daniell [2 ]
Ferreira, Marco A. R. [1 ]
Karr, Alan F. [3 ]
机构
[1] Univ Missouri, Dept Stat, Columbia, MO 65211 USA
[2] US Bur Labor Stat, Off Survey Methods Res, Washington, DC 20212 USA
[3] Natl Inst Stat Sci, Res Triangle Pk, NC 27709 USA
基金
美国国家科学基金会;
关键词
Cell suppression; Disclosure; Dynamic linear models; Missing data; Multiscale modeling; QCEW; DISCLOSURE LIMITATION; CELL SUPPRESSION; TABULAR DATA; MICRODATA; MODELS; FRAMEWORK; PRIVACY; SYSTEMS; UTILITY; WORLD;
D O I
10.1198/jasa.2009.ap08629
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Many scientific, sociological, and economic applications present data that are collected on multiple scales of resolution. One particular form of multiscale data arises when data are aggregated across different scales both longitudinally and by economic sector. Frequently, such datasets experience missing observations in a manner that they can be accurately imputed, while respecting the constraints imposed by the multiscale nature of the data, using the method we propose known as Bayesian multiscale multiple imputation. Our approach couples dynamic linear models with a novel imputation step based on singular normal distribution theory. Although our method is of independent interest, one important implication of such methodology is its potential effect on confidential databases protected by means of cell suppression. In order to demonstrate the proposed methodology and to assess the effectiveness of disclosure practices in longitudinal databases, we conduct a large-scale empirical study using the U.S. Bureau of Labor Statistics Quarterly Census of Employment and Wages (QCEW). During the course of our empirical investigation it is determined that several of the predicted cells are within 1% accuracy, thus causing potential concerns for data confidentiality.
引用
收藏
页码:564 / 577
页数:14
相关论文
共 50 条
  • [41] Multiple imputation of missing data for survey data analysis
    Lupo, Coralie
    Le Bouquin, Sophie
    Michel, Virginie
    Colin, Pierre
    Chauvin, Claire
    EPIDEMIOLOGIE ET SANTE ANIMALE, 2008, NO 53, 2008, (53): : 73 - 83
  • [42] Bayesian multiscale methods for Poisson count data
    Kolaczyk, ED
    STATISTICAL CHALLENGES IN ASTRONOMY, 2003, : 89 - 102
  • [43] Bayesian multiscale analysis for time series data
    Oigard, Tor Arne
    Rue, Havard
    Godtliebsen, Fred
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2006, 51 (03) : 1719 - 1730
  • [44] Effects of imputation on correlation: implications for analysis of mass spectrometry data from multiple biological matrices
    Taylor, Sandra L.
    Ruhaak, L. Renee
    Kelly, Karen
    Weiss, Robert H.
    Kim, Kyoungmi
    BRIEFINGS IN BIOINFORMATICS, 2017, 18 (02) : 312 - 320
  • [45] Correspondence Analysis with Incomplete Paired Data using Bayesian Imputation
    de Tibeiro, Jules J. S.
    Murdoch, Duncan J.
    BAYESIAN ANALYSIS, 2010, 5 (03): : 519 - 532
  • [46] A Bayesian Singular Value Decomposition Procedure for Missing Data Imputation
    Zhai, Ruoshui
    Gutman, Roee
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2023, 32 (02) : 470 - 482
  • [47] A Bayesian tensor decomposition approach for spatiotemporal traffic data imputation
    Chen, Xinyu
    He, Zhaocheng
    Sun, Lijun
    TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES, 2019, 98 : 73 - 84
  • [48] Bayesian network data imputation with application to survival tree analysis
    Rancoita, Paola M. V.
    Zaffalon, Marco
    Zucca, Emanuele
    Bertoni, Francesco
    de Campos, Cassio P.
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2016, 93 : 373 - 387
  • [49] BACP: Bayesian Augmented CP Factorization for Traffic Data Imputation
    Huang, Rongping
    Gong, Wenwu
    Lu, Jiaxin
    Huang, Zhejun
    Yang, Lili
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT XIII, ICIC 2024, 2024, 14874 : 108 - 120
  • [50] Multivariate imputation of qualitative missing data using Bayesian networks
    Romero, V
    Salmerón, A
    SOFT METHODOLOGY AND RANDOM INFORMATION SYSTEMS, 2004, : 605 - 612