Accuracy of probabilistic and deterministic record linkage: the case of tuberculosis

被引:21
|
作者
de Oliveira, Gisele Pinto [1 ]
de Souza Bierrenbach, Ana Luiza [2 ]
de Camargo Junior, Kenneth Rochel [3 ]
Coeli, Claudia Medina [4 ]
Pinheiro, Rejane Sobrino [4 ]
机构
[1] Univ Fed Rio de Janeiro, Inst Estudos Saude Colet, Programa Posgrad Saude Colet, Rio De Janeiro, RJ, Brazil
[2] Hosp Sirio Libanes, Inst Ensino & Pesquisa, Sao Paulo, SP, Brazil
[3] Univ Estado Rio de Janeiro, Inst Med Social, Rio De Janeiro, RJ, Brazil
[4] Univ Fed Rio de Janeiro, Inst Estudos Saude Colet, Rio De Janeiro, RJ, Brazil
来源
REVISTA DE SAUDE PUBLICA | 2016年 / 50卷
关键词
Tuberculosis; epidemiology; Data Accuracy; Sensitivity and Specificity; Epidemiological Surveillance; statistics & numerical data;
D O I
10.1590/S1518-8787.2016050006327
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
OBJECTIVE: To analyze the accuracy of deterministic and probabilistic record linkage to identify TB duplicate records, as well as the characteristics of discordant pairs. METHODS: The study analyzed all TB records from 2009 to 2011 in the state of Rio de Janeiro. A deterministic record linkage algorithm was developed using a set of 70 rules, based on the combination of fragments of the key variables with or without modification (Soundex or substring). Each rule was formed by three or more fragments. The probabilistic approach required a cutoff point for the score, above which the links would be automatically classified as belonging to the same individual. The cutoff point was obtained by linkage of the Notifiable Diseases Information System - Tuberculosis database with itself, subsequent manual review and ROC curves and precision-recall. Sensitivity and specificity for accurate analysis were calculated. RESULTS: Accuracy ranged from 87.2% to 95.2% for sensitivity and 99.8% to 99.9% for specificity for probabilistic and deterministic record linkage, respectively. The occurrence of missing values for the key variables and the low percentage of similarity measure for name and date of birth were mainly responsible for the failure to identify records of the same individual with the techniques used. CONCLUSIONS: The two techniques showed a high level of correlation for pair classification. Although deterministic linkage identified more duplicate records than probabilistic linkage, the latter retrieved records not identified by the former. User need and experience should be considered when choosing the best technique to be used.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Estimating Precision and Recall for Deterministic and Probabilistic Record Linkage
    Chipperfield, James
    Hansen, Noel
    Rossiter, Peter
    [J]. INTERNATIONAL STATISTICAL REVIEW, 2018, 86 (02) : 219 - 236
  • [2] Results from simulated data sets: probabilistic record linkage outperforms deterministic record linkage
    Tromp, Miranda
    Ravelli, Anita C.
    Bonsel, Gouke J.
    Hasman, Arie
    Reitsma, Johannes B.
    [J]. JOURNAL OF CLINICAL EPIDEMIOLOGY, 2011, 64 (05) : 565 - 572
  • [3] Deterministic and Probabilistic Record Linkage: an Application to Primary Care Data
    Carreras, Giulia
    Simonetti, Monica
    Cricelli, Claudio
    Lapi, Francesco
    [J]. JOURNAL OF MEDICAL SYSTEMS, 2018, 42 (05)
  • [4] Deterministic and Probabilistic Record Linkage: an Application to Primary Care Data
    Giulia Carreras
    Monica Simonetti
    Claudio Cricelli
    Francesco Lapi
    [J]. Journal of Medical Systems, 2018, 42
  • [5] A hybrid approach to record linkage using a combination of deterministic and probabilistic methodology
    Ong, Toan C.
    Duca, Lindsey M.
    Kahn, Michael G.
    Crume, Tessa L.
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2020, 27 (04) : 505 - 513
  • [6] Detecting Duplicates at Hospital Admission: Comparison of Deterministic and Probabilistic Record Linkage
    Waldenburger, Andreas
    Nasseh, Daniel
    Stausberg, Juergen
    [J]. UNIFYING THE APPLICATIONS AND FOUNDATIONS OF BIOMEDICAL AND HEALTH INFORMATICS, 2016, 226 : 135 - 138
  • [7] Probabilistic record linkage
    Sayers, Adrian
    Ben-Shlomo, Yoav
    Blom, Ashley W.
    Steele, Fiona
    [J]. INTERNATIONAL JOURNAL OF EPIDEMIOLOGY, 2016, 45 (03) : 954 - 964
  • [8] A Machine Learning Trainable Model to Assess the Accuracy of Probabilistic Record Linkage
    Pita, Robespierre
    Mendonca, Everton
    Reis, Sandra
    Barreto, Marcos
    Denaxas, Spiros
    [J]. BIG DATA ANALYTICS AND KNOWLEDGE DISCOVERY, DAWAK 2017, 2017, 10440 : 214 - 227
  • [9] Accuracy of probabilistic record linkage applied to health databases: systematic review
    da Silveira, Daniele Pinto
    Artmann, Elizabeth
    [J]. REVISTA DE SAUDE PUBLICA, 2009, 43 (05): : 875 - 882
  • [10] Inclusion of a deterministic post-processing stage to increase the performance of probabilistic record linkage
    Brustulin, Rafael
    Marson, Poliana Guerino
    [J]. CADERNOS DE SAUDE PUBLICA, 2018, 34 (06):