Accuracy of probabilistic and deterministic record linkage: the case of tuberculosis

被引:21
|
作者
de Oliveira, Gisele Pinto [1 ]
de Souza Bierrenbach, Ana Luiza [2 ]
de Camargo Junior, Kenneth Rochel [3 ]
Coeli, Claudia Medina [4 ]
Pinheiro, Rejane Sobrino [4 ]
机构
[1] Univ Fed Rio de Janeiro, Inst Estudos Saude Colet, Programa Posgrad Saude Colet, Rio De Janeiro, RJ, Brazil
[2] Hosp Sirio Libanes, Inst Ensino & Pesquisa, Sao Paulo, SP, Brazil
[3] Univ Estado Rio de Janeiro, Inst Med Social, Rio De Janeiro, RJ, Brazil
[4] Univ Fed Rio de Janeiro, Inst Estudos Saude Colet, Rio De Janeiro, RJ, Brazil
来源
REVISTA DE SAUDE PUBLICA | 2016年 / 50卷
关键词
Tuberculosis; epidemiology; Data Accuracy; Sensitivity and Specificity; Epidemiological Surveillance; statistics & numerical data;
D O I
10.1590/S1518-8787.2016050006327
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
OBJECTIVE: To analyze the accuracy of deterministic and probabilistic record linkage to identify TB duplicate records, as well as the characteristics of discordant pairs. METHODS: The study analyzed all TB records from 2009 to 2011 in the state of Rio de Janeiro. A deterministic record linkage algorithm was developed using a set of 70 rules, based on the combination of fragments of the key variables with or without modification (Soundex or substring). Each rule was formed by three or more fragments. The probabilistic approach required a cutoff point for the score, above which the links would be automatically classified as belonging to the same individual. The cutoff point was obtained by linkage of the Notifiable Diseases Information System - Tuberculosis database with itself, subsequent manual review and ROC curves and precision-recall. Sensitivity and specificity for accurate analysis were calculated. RESULTS: Accuracy ranged from 87.2% to 95.2% for sensitivity and 99.8% to 99.9% for specificity for probabilistic and deterministic record linkage, respectively. The occurrence of missing values for the key variables and the low percentage of similarity measure for name and date of birth were mainly responsible for the failure to identify records of the same individual with the techniques used. CONCLUSIONS: The two techniques showed a high level of correlation for pair classification. Although deterministic linkage identified more duplicate records than probabilistic linkage, the latter retrieved records not identified by the former. User need and experience should be considered when choosing the best technique to be used.
引用
收藏
页数:9
相关论文
共 50 条
  • [41] Children's recognition of the usefulness of a record: Distinguishing deterministic and probabilistic events
    Rapp, Andreas F.
    Wilkening, Friedrich
    [J]. EUROPEAN JOURNAL OF DEVELOPMENTAL PSYCHOLOGY, 2005, 2 (04) : 344 - 363
  • [42] Probabilistic vs. Deterministic Linkage of Large Device Registries to Medicare Data
    Setoguchi, Soko
    Myers, Jessica A.
    Jalbert, Jessica J.
    Chen, Chih-Ying
    [J]. PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2012, 21 : 314 - 314
  • [43] Evaluation of underreporting tuberculosis in Central Italy by means of record linkage
    Melosini, Lorenza
    Vetrano, Umberto
    Dente, Federico L.
    Cristofano, Michele
    Giraldi, Mauro
    Gabbrielli, Luciano
    Novelli, Federica
    Aquilini, Ferruccio
    Rindi, Laura
    Menichetti, Francesco
    Freer, Giulia
    Paggiaro, Pierluigi L.
    [J]. BMC PUBLIC HEALTH, 2012, 12
  • [44] Evaluation of underreporting tuberculosis in Central Italy by means of record linkage
    Lorenza Melosini
    Umberto Vetrano
    Federico L Dente
    Michele Cristofano
    Mauro Giraldi
    Luciano Gabbrielli
    Federica Novelli
    Ferruccio Aquilini
    Laura Rindi
    Francesco Menichetti
    Giulia Freer
    Pierluigi L Paggiaro
    [J]. BMC Public Health, 12
  • [45] Errors in survival rates caused by routinely used deterministic record linkage methods
    Oberaigner, W.
    [J]. METHODS OF INFORMATION IN MEDICINE, 2007, 46 (04) : 420 - 424
  • [46] Field Weights Computation for Probabilistic Record Linkage in Presence of Missing Data
    Zhang, Yinghao
    Xu, Senlin
    Zheng, Mingfan
    Li, Xinran
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2020, 34 (14)
  • [47] Probabilistic Record Linkage in Astronomy: Directional Cross-Identification and Beyond
    Budavari, Tamas
    Loredo, Thomas J.
    [J]. ANNUAL REVIEW OF STATISTICS AND ITS APPLICATION, VOL 2, 2015, 2 : 113 - 139
  • [48] An efficient validation method of probabilistic record linkage including readmissions and twins
    Tromp, M.
    Ravelli, A. C. J.
    Meray, N.
    Reitsma, J. B.
    Bonsel, G. J.
    [J]. METHODS OF INFORMATION IN MEDICINE, 2008, 47 (04) : 356 - 363
  • [49] Privacy Preserving Probabilistic Record Linkage Without Trusted Third Party
    Lazrig, Ibrahim
    Ong, Toan C.
    Ray, Indrajit
    Ray, Indrakshi
    Jiang, Xiaoqian
    Vaidya, Jaideep
    [J]. 2018 16TH ANNUAL CONFERENCE ON PRIVACY, SECURITY AND TRUST (PST), 2018, : 75 - 84
  • [50] A practical approach for incorporating dependence among fields in probabilistic record linkage
    Joanne K Daggy
    Huiping Xu
    Siu L Hui
    Roland E Gamache
    Shaun J Grannis
    [J]. BMC Medical Informatics and Decision Making, 13