A mixture model for the analysis of data derived from record linkage

被引:10
|
作者
Hof, M. H. P. [1 ]
Zwinderman, A. H. [1 ]
机构
[1] Univ Amsterdam, Acad Med Ctr, Dept Clin Epidemiol Biostat & Bioinformat, NL-1105 AZ Amsterdam, Netherlands
关键词
probabilistic record linkage; EM algorithm; large data sources; combining multiple registries; partially identifying variables; LINKED DATA; DENSITY-ESTIMATION; REGRESSION;
D O I
10.1002/sim.6315
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Combining information from two data sources depends on finding records that belong to the same individual (matches). Sometimes, unique identifiers per individual are not available, and we have to rely on partially identifying variables that are registered in both data sources. A risk of relying on these variables is that some records from both datasets are wrongly linked to each other, which introduces bias in further regression analyses. In this paper, we propose a mixture model where we treat the indicator whether records belong to the same individual as missing. Each pair of records from both datasets contributes independently to a pairwise pseudo-likelihood, which we maximize with an expectation-maximization algorithm. Each part of the pseudo-likelihood is parameterized by the appropriate (parametric) density function. Moreover, some structures of the data allow for simplifying assumptions, which makes the pseudo-likelihood considerably easier to parameterize. Because the optimization requires a product over all combinations of records from both datasets, we suggest a procedure that summarizes information from highly unlikely matches. With simulations, we showed that the new approach produces accurate estimates in different linkage scenarios. Moreover, the estimator remained accurate in scenarios where previously proposed analysis approaches give biased results. We applied the method to estimation of the association between pregnancy duration of the first and second born children from the same mother from a register without mother identifier. Copyright (c) 2014 John Wiley & Sons, Ltd.
引用
收藏
页码:74 / 92
页数:19
相关论文
共 50 条
  • [41] Effective record linkage for mining campaign contribution data
    C. Giraud-Carrier
    J. Goodliffe
    B. M. Jones
    S. Cueva
    Knowledge and Information Systems, 2015, 45 : 389 - 416
  • [42] Data-driven name reduction for record linkage
    Schraagen, Marijn
    Kosters, Walter
    2012 SECOND INTERNATIONAL CONFERENCE ON INNOVATIVE COMPUTING TECHNOLOGY (INTECH), 2012, : 311 - 316
  • [43] CHILD SPACING ANALYSIS VIA RECORD LINKAGE - NEW DATA PLUS A SUMMING-UP FROM EARLIER REPORTS
    CHRISTENSEN, HT
    MARRIAGE AND FAMILY LIVING, 1963, 25 (03): : 272 - 280
  • [44] Feasibility of cross-vendor linkage of ophthalmic images with electronic health record data: an analysis from the IRIS Registry®
    Mbagwu, Michael
    Chu, Zhongdi
    Borkar, Durga
    Koshta, Alex
    Shah, Nisarg
    Torres, Aracelis
    Kalvaria, Hylton
    Lum, Flora
    Leng, Theodore
    JAMIA OPEN, 2024, 7 (01)
  • [45] A factor mixture analysis model for multivariate binary data
    Cagnone, Silvia
    Viroli, Cinzia
    STATISTICAL MODELLING, 2012, 12 (03) : 257 - 277
  • [46] MIXTURE MODEL APPROACH TO THE ANALYSIS OF HETEROGENEOUS SURVIVAL DATA
    Erisoglu, Ulku
    Erisoglu, Murat
    Erol, Hamza
    PAKISTAN JOURNAL OF STATISTICS, 2012, 28 (01): : 115 - 130
  • [47] ANALYSIS OF LONGITUDINAL DATA USING A FINITE MIXTURE MODEL
    DIETZ, E
    BOHNING, D
    STATISTICAL PAPERS, 1994, 35 (03) : 203 - 210
  • [48] CONDITIONAL INFERENCE ON A MIXTURE MODEL FOR THE ANALYSIS OF COUNT DATA
    YIP, P
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 1991, 20 (07) : 2045 - 2057
  • [49] RECORD LINKAGE - A POWERFUL TOOL FOR EPIDEMIOLOGIC ANALYSIS
    STERN, RS
    ARCHIVES OF DERMATOLOGY, 1986, 122 (12) : 1383 - 1384
  • [50] A medical record linkage analysis of abortion underreporting
    Udry, JR
    Gaughan, M
    Schwingl, PJ
    vandenBerg, BJ
    FAMILY PLANNING PERSPECTIVES, 1996, 28 (05): : 228 - 231