Bayesian Multiscale Multiple Imputation With Implications for Data Confidentiality

被引:23
|
作者
Holan, Scott H. [1 ]
Toth, Daniell [2 ]
Ferreira, Marco A. R. [1 ]
Karr, Alan F. [3 ]
机构
[1] Univ Missouri, Dept Stat, Columbia, MO 65211 USA
[2] US Bur Labor Stat, Off Survey Methods Res, Washington, DC 20212 USA
[3] Natl Inst Stat Sci, Res Triangle Pk, NC 27709 USA
基金
美国国家科学基金会;
关键词
Cell suppression; Disclosure; Dynamic linear models; Missing data; Multiscale modeling; QCEW; DISCLOSURE LIMITATION; CELL SUPPRESSION; TABULAR DATA; MICRODATA; MODELS; FRAMEWORK; PRIVACY; SYSTEMS; UTILITY; WORLD;
D O I
10.1198/jasa.2009.ap08629
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Many scientific, sociological, and economic applications present data that are collected on multiple scales of resolution. One particular form of multiscale data arises when data are aggregated across different scales both longitudinally and by economic sector. Frequently, such datasets experience missing observations in a manner that they can be accurately imputed, while respecting the constraints imposed by the multiscale nature of the data, using the method we propose known as Bayesian multiscale multiple imputation. Our approach couples dynamic linear models with a novel imputation step based on singular normal distribution theory. Although our method is of independent interest, one important implication of such methodology is its potential effect on confidential databases protected by means of cell suppression. In order to demonstrate the proposed methodology and to assess the effectiveness of disclosure practices in longitudinal databases, we conduct a large-scale empirical study using the U.S. Bureau of Labor Statistics Quarterly Census of Employment and Wages (QCEW). During the course of our empirical investigation it is determined that several of the predicted cells are within 1% accuracy, thus causing potential concerns for data confidentiality.
引用
收藏
页码:564 / 577
页数:14
相关论文
共 50 条
  • [21] A Bayesian multiple imputation method for handling longitudinal pesticide data with values below the limit of detection
    Chen, Haiying
    Quandt, Sara A.
    Grzywacz, Joseph G.
    Arcury, Thomas A.
    ENVIRONMETRICS, 2013, 24 (02) : 132 - 142
  • [22] Accounting for uncertainty due to data processing in virtual population analysis using Bayesian multiple imputation
    Carruthers, Thomas
    Kell, Laurence
    Palma, Carlos
    CANADIAN JOURNAL OF FISHERIES AND AQUATIC SCIENCES, 2018, 75 (06) : 883 - 896
  • [23] Multiple Imputation For Missing Ordinal Data
    Chen, Ling
    Toma-Drane, Mariana
    Valois, Robert F.
    Drane, J. Wanzer
    JOURNAL OF MODERN APPLIED STATISTICAL METHODS, 2005, 4 (01) : 288 - 299
  • [24] MULTIPLE IMPUTATION AS A MISSING DATA MACHINE
    BRAND, J
    VANBUUREN, S
    VANMULLIGEN, EM
    TIMMERS, T
    GELSEMA, E
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 1994, : 303 - 306
  • [25] Multiple imputation with missing data indicators
    Beesley, Lauren J.
    Bondarenko, Irina
    Elliot, Michael R.
    Kurian, Allison W.
    Katz, Steven J.
    Taylor, Jeremy M. G.
    STATISTICAL METHODS IN MEDICAL RESEARCH, 2021, 30 (12) : 2685 - 2700
  • [26] Multiple Imputation for Longitudinal Data: A Tutorial
    Wijesuriya, Rushani
    Moreno-Betancur, Margarita
    Carlin, John B.
    White, Ian R.
    Quartagno, Matteo
    Lee, Katherine J.
    STATISTICS IN MEDICINE, 2025, 44 (3-4)
  • [27] Visual analysis for panel data imputation with Bayesian network
    Hanbyul Yeon
    Seongbum Seo
    Hyesook Son
    Yun Jang
    The Journal of Supercomputing, 2022, 78 : 1759 - 1782
  • [28] Multiple imputation of unordered categorical missing data: A comparison of the multivariate normal imputation and multiple imputation by chained equations
    Karangwa, Innocent
    Kotze, Danelle
    Blignaut, Renette
    BRAZILIAN JOURNAL OF PROBABILITY AND STATISTICS, 2016, 30 (04) : 521 - 539
  • [29] Multiple imputation: dealing with missing data
    de Goeij, Moniek C. M.
    van Diepen, Merel
    Jager, Kitty J.
    Tripepi, Giovanni
    Zoccali, Carmine
    Dekker, Friedo W.
    NEPHROLOGY DIALYSIS TRANSPLANTATION, 2013, 28 (10) : 2415 - 2420
  • [30] Multiple imputation for nonignorable missing data
    Im, Jongho
    Kim, Soeun
    JOURNAL OF THE KOREAN STATISTICAL SOCIETY, 2017, 46 (04) : 583 - 592