A mixture model for the analysis of data derived from record linkage

被引:10
|
作者
Hof, M. H. P. [1 ]
Zwinderman, A. H. [1 ]
机构
[1] Univ Amsterdam, Acad Med Ctr, Dept Clin Epidemiol Biostat & Bioinformat, NL-1105 AZ Amsterdam, Netherlands
关键词
probabilistic record linkage; EM algorithm; large data sources; combining multiple registries; partially identifying variables; LINKED DATA; DENSITY-ESTIMATION; REGRESSION;
D O I
10.1002/sim.6315
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Combining information from two data sources depends on finding records that belong to the same individual (matches). Sometimes, unique identifiers per individual are not available, and we have to rely on partially identifying variables that are registered in both data sources. A risk of relying on these variables is that some records from both datasets are wrongly linked to each other, which introduces bias in further regression analyses. In this paper, we propose a mixture model where we treat the indicator whether records belong to the same individual as missing. Each pair of records from both datasets contributes independently to a pairwise pseudo-likelihood, which we maximize with an expectation-maximization algorithm. Each part of the pseudo-likelihood is parameterized by the appropriate (parametric) density function. Moreover, some structures of the data allow for simplifying assumptions, which makes the pseudo-likelihood considerably easier to parameterize. Because the optimization requires a product over all combinations of records from both datasets, we suggest a procedure that summarizes information from highly unlikely matches. With simulations, we showed that the new approach produces accurate estimates in different linkage scenarios. Moreover, the estimator remained accurate in scenarios where previously proposed analysis approaches give biased results. We applied the method to estimation of the association between pregnancy duration of the first and second born children from the same mother from a register without mother identifier. Copyright (c) 2014 John Wiley & Sons, Ltd.
引用
收藏
页码:74 / 92
页数:19
相关论文
共 50 条
  • [31] Secure Record Linkage of Large Health Data Sets: Evaluation of a Hybrid Cloud Model
    Brown, Adrian Paul
    Randall, Sean M.
    JMIR MEDICAL INFORMATICS, 2020, 8 (09)
  • [32] APPLICATION OF LOGIT MODEL TO STATISTICAL RECORD LINKAGE
    ODOROFF, CL
    BIOMETRICS, 1978, 34 (04) : 743 - 743
  • [33] Variable selection for latent class analysis in the presence of missing data with application to record linkage
    Xu, Huiping
    Li, Xiaochun
    Zhang, Zuoyi
    Grannis, Shaun
    STATISTICAL METHODS IN MEDICAL RESEARCH, 2024, 33 (06) : 966 - 980
  • [34] Fast Bayesian Record Linkage for Streaming Data Contexts
    Taylor, Ian
    Kaplan, Andee
    Betancourt, Brenda
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2024, 33 (03) : 833 - 844
  • [35] A Unified Record Linkage Strategy for Web Service Data
    Kan, Qin
    Yang, Yujiu
    Zhen, Shiqiang
    Liu, Wenhuang
    THIRD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING: WKDD 2010, PROCEEDINGS, 2010, : 253 - 256
  • [36] Record linkage strategies, outpatient procedures, and administrative data
    Roos, LL
    Walld, R
    Wajda, A
    Bond, R
    Hartford, K
    MEDICAL CARE, 1996, 34 (06) : 570 - 582
  • [37] Linkability measures to assess the data characteristics for record linkage
    Ong, Toan C.
    Hill, Andrew
    Kahn, Michael G.
    Lembcke, Lauren R.
    Schilling, Lisa M.
    Grannis, Shaun J.
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2024,
  • [38] Improved quality of tuberculosis data using record linkage
    Bartholomay, Patricia
    de Oliveira, Gisele Pinto
    Pinheiro, Rejane Sobrino
    Nogales Vasconcelos, Ana Maria
    CADERNOS DE SAUDE PUBLICA, 2014, 30 (11): : 2459 - 2469
  • [39] Effective record linkage for mining campaign contribution data
    Giraud-Carrier, C.
    Goodliffe, J.
    Jones, B. M.
    Cueva, S.
    KNOWLEDGE AND INFORMATION SYSTEMS, 2015, 45 (02) : 389 - 416
  • [40] A FORMALIZATION OF RECORD LINKAGE AND ITS APPLICATION TO DATA PROTECTION
    Torra, Vicenc
    Stokes, Klara
    INTERNATIONAL JOURNAL OF UNCERTAINTY FUZZINESS AND KNOWLEDGE-BASED SYSTEMS, 2012, 20 (06) : 907 - 919