Comparing record linkage software programs and algorithms using real-world data

被引:16
|
作者
Karr, Alan F. [1 ]
Taylor, Matthew T. [3 ]
West, Suzanne L. [1 ]
Setoguchi, Soko [2 ]
Kou, Tzuyung D. [4 ]
Gerhard, Tobias [2 ]
Horton, Daniel B. [2 ]
机构
[1] RTI Int, Res Triangle Pk, NC 27709 USA
[2] Rutgers State Univ, Ctr Pharmacoepidemiol & Treatment Sci, Inst Hlth Hlth Care Policy & Aging Res, New Brunswick, NJ USA
[3] Thomas Jefferson Univ, Sidney Kimmel Med Coll, Philadelphia, PA USA
[4] Bristol Myers Squibb, Hopewell, NJ USA
来源
PLOS ONE | 2019年 / 14卷 / 09期
关键词
D O I
10.1371/journal.pone.0221459
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Linkage of medical databases, including insurer claims and electronic health records (EHRs), is increasingly common. However, few studies have investigated the behavior and output of linkage software. To determine how linkage quality is affected by different algorithms, blocking variables, methods for string matching and weight determination, and decision rules, we compared the performance of 4 nonproprietary linkage software packages linking patient identifiers from noninteroperable inpatient and outpatient EHRs. We linked datasets using first and last name, gender, and date of birth (DOB). We evaluated DOB and year of birth (YOB) as blocking variables and used exact and inexact matching methods. We compared the weights assigned to record pairs and evaluated how matching weights corresponded to a gold standard, medical record number. Deduplicated datasets contained 69,523 inpatient and 176,154 outpatient records, respectively. Linkage runs blocking on DOB produced weights ranging in number from 8 for exact matching to 64,273 for inexact matching. Linkage runs blocking on YOB produced 8 to 916,806 weights. Exact matching matched record pairs with identical test characteristics (sensitivity 90.48%, specificity 99.78%) for the highest ranked group, but algorithms differentially prioritized certain variables. Inexact matching behaved more variably, leading to dramatic differences in sensitivity (range 0.04-93.36%) and positive predictive value (PPV) (range 86.67-97.35%), even for the most highly ranked record pairs. Blocking on DOB led to higher PPV of highly ranked record pairs. An ensemble approach based on averaging scaled matching weights led to modestly improved accuracy. In summary, we found few differences in the rankings of record pairs with the highest matching weights across 4 linkage packages. Performance was more consistent for exact string matching than for inexact string matching. Most methods and software packages performed similarly when comparing matching accuracy with the gold standard. In some settings, an ensemble matching approach may outperform individual linkage algorithms.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] A comparison of record linkage software and algorithms using real-world data
    West, Suzanne L.
    Karr, Alan
    Taylor, Matthew T.
    Setoguchi, Soko
    Kou, Doug
    Gerhard, Tobias
    Horton, Daniel B.
    [J]. PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2018, 27 : 90 - 90
  • [2] From real-world electronic health record data to real-world results using artificial intelligence
    Knevel, Rachel
    Liao, Katherine P.
    [J]. ANNALS OF THE RHEUMATIC DISEASES, 2023, 82 (03) : 306 - 311
  • [3] REAL-WORLD PROBLEMS WITH REAL-WORLD DATA: ADDRESSING DATA QUALITY IN THE ELECTRONIC HEALTH RECORD
    Anderson, Wesley
    Boyce, Danielle
    Kurtycz, Ruth
    Roddy, Will
    Heavner, Smith
    [J]. CRITICAL CARE MEDICINE, 2024, 52
  • [4] Comparing Password Ranking Algorithms on Real-World Password Datasets
    Yang, Weining
    Li, Ninghui
    Molloy, Ian M.
    Park, Youngja
    Chari, Suresh N.
    [J]. COMPUTER SECURITY - ESORICS 2016, PT I, 2016, 9878 : 69 - 90
  • [5] Comparing ICP variants on real-world data sets
    Pomerleau, Francois
    Colas, Francis
    Siegwart, Roland
    Magnenat, Stephane
    [J]. AUTONOMOUS ROBOTS, 2013, 34 (03) : 133 - 148
  • [6] A Comparison of Statistical Linkage Keys with Bloom Filter-based Encryptions for Privacy-preserving Record Linkage using Real-world Mammography Data
    Schnell, Rainer
    Richter, Anke
    Borgs, Christian
    [J]. PROCEEDINGS OF THE 10TH INTERNATIONAL JOINT CONFERENCE ON BIOMEDICAL ENGINEERING SYSTEMS AND TECHNOLOGIES, VOL 5: HEALTHINF, 2017, : 276 - 283
  • [7] COMPARING THE EFFECTIVENESS OF ANTIRESORPTIVES FOR FRACTURE RISK REDUCTION USING REAL-WORLD DATA
    Curtis, J.
    [J]. AGING CLINICAL AND EXPERIMENTAL RESEARCH, 2022, 34 (SUPPL 1) : S102 - S103
  • [8] Manual Evaluation of Record Linkage Algorithm Performance in Four Real-World Datasets
    Gupta, Agrayan K.
    Xu, Huiping
    Li, Xiaochu
    Vest, Joshua R.
    Grannis, Shaun J.
    [J]. APPLIED CLINICAL INFORMATICS, 2024, 15 (03): : 620 - 628
  • [9] Comparing dominance hierarchy methods using a data-splitting approach with real-world data
    Vilette, Chloe
    Bonnell, Tyler
    Henzi, Peter
    Barrett, Louise
    [J]. BEHAVIORAL ECOLOGY, 2020, 31 (06) : 1379 - 1390
  • [10] Real-World Battles with Real-World Data
    Brown, Jeffrey
    Bate, Andrew
    Platt, Robert
    Raebel, Marsha
    Sauer, Brian
    Trifiro, Gianluca
    [J]. PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2017, 26 : 254 - 255