Exploring the Complexity of Real-World Health Data Record Linkage-An Exemplary Study Linking Cancer Registry and Claims Data

被引:0
|
作者
Lendle, Nadja [1 ]
Kollhorst, Bianca [1 ]
Intemann, Timm [1 ]
机构
[1] Leibniz Inst Prevent Res & Epidemiol BIPS, Dept Biometry & Data Management, Bremen, Germany
关键词
administrative healthcare database; data linkage; deterministic linkage; gradient boosting; pharmacoepidemiology; quasi-identifiers;
D O I
10.1002/pds.70120
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
PurposeRecord linkage based on quasi-identifiers remains an important approach as not every data source provides a comprehensive unique identifier. In this study, the reasons for the failure of a linkage based on quasi-identifiers were examined. Furthermore, informed algorithms using information on gold standard links were developed to investigate the potentially achievable linkage quality based on quasi-identifiers.MethodsThe study population includes patients from an antidiabetic cohort from German claims and colorectal cancer patients from two German cancer registries. Linkage algorithms were applied using information on gold standard links. Informed linkage algorithms based on deterministic linkage, logistic regression, random forests, gradient boosting, and neural networks were derived and compared. Descriptive analyses were performed to identify reasons for the failure of linkage, such as discrepancies between data sources.ResultsA gradient boosting-based linkage approach performed best, achieving a precision (positive predictive value) of 77%, a recall (sensitivity) of 81%, and an F*-measure (combining precision and recall) of 64%. Of 641 patients in GePaRD, 8% were not uniquely identifiable using birth year, sex, area of residence, and year and quarter of diagnosis, whereas 33% of 42 817 cancer registry patients were not uniquely identifiable with these quasi-identifiers.ConclusionsLinkage of German claims and cancer registry data based on quasi-identifiers does result in insufficient linkage quality since subjects cannot be uniquely identified. It is advisable to use unique identifiers from a subsample, if available, to derive informed linkage algorithms for the entire sample. In this case, the machine learning technique gradient boosting has been found to outperform other methods.
引用
收藏
页数:8
相关论文
共 50 条
  • [41] LINKING CLAIMS AND ELECTRONIC MEDICAL RECORD (EMR) DATA FOR A HYPERTENSION STUDY
    Danielson, E.
    Chang, S.
    Long, S.
    VALUE IN HEALTH, 2010, 13 (03) : A179 - A179
  • [42] Real-world Overall Survival Using Oncology Electronic Health Record Data: Friends of Cancer Research Pilot
    Lasiter, Laura
    Tymejczyk, Olga
    Garrett-Mayer, Elizabeth
    Baxi, Shrujal
    Belli, Andrew J.
    Boyd, Marley
    Christian, Jennifer B.
    Cohen, Aaron B.
    Espirito, Janet L.
    Hansen, Eric
    Sweetnam, Connor
    Robert, Nicholas J.
    Small, Mackenzie
    Stewart, Mark D.
    Izano, Monika A.
    Wagner, Joseph
    Natanzon, Yanina
    Rivera, Donna R.
    Allen, Jeff
    CLINICAL PHARMACOLOGY & THERAPEUTICS, 2022, 111 (02) : 444 - 454
  • [43] Effectiveness of mepolizumab in severe asthma in Japan: A real-world study using claims data
    Nagase, Hiroyuki
    Tamaoki, Jun
    Suzuki, Takeo
    Nezu, Yasuko
    Katsumata, Masayuki
    Komatsubara, Masaki
    Mu, George
    Yang, Shibing
    Cole, Ashley L.
    Alfonso-Cristancho, Rafael
    CLINICAL AND TRANSLATIONAL ALLERGY, 2021, 11 (08)
  • [44] Epidemiology of Wilson disease in Germany - real-world insights from a claims data study
    Fang, Shona
    Hedera, Peter
    Borchert, Julia
    Schultze, Michael
    Weiss, Karl Heinz
    ORPHANET JOURNAL OF RARE DISEASES, 2024, 19 (01)
  • [45] Use of real-world registry data: a hernia mesh example
    Lee, T. -H.
    Choudhuri, A.
    Ulisney, K.
    Swiger, J.
    Poulose, B.
    Rosen, M.
    Gibeily, G.
    HERNIA, 2020, 24 (03) : 587 - 590
  • [46] Use of real-world registry data: a hernia mesh example
    T.-H. Lee
    A. Choudhuri
    K. Ulisney
    J. Swiger
    B. Poulose
    M. Rosen
    G. Gibeily
    Hernia, 2020, 24 : 587 - 590
  • [47] Dynamic Real World Data Platform Integrating Automated Claims and Registry Data for Pharmacoepidemiologic Studies
    Phillips, Syd
    Araujo, Andre
    Proctor, Charissa
    Malatestinic, William
    Larmore, Cynthia
    Harrold, Leslie R.
    Reed, George W.
    Casso, Deb
    Johnson, Karin
    Stamer, Jeremy
    Oliveria, Susan A.
    PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2016, 25 : 392 - 393
  • [48] ADDITION OF OPEN ADMINISTRATIVE CLAIMS SIGNIFICANTLY IMPROVES CAPTURE OF MORTALITY IN ELECTRONIC HEALTH RECORD (EHR) - FOCUSED REAL-WORLD DATA (RWD)
    Slipski, L.
    Dub, B.
    Yu, Y.
    Walker, M.
    Natanzon, Y.
    VALUE IN HEALTH, 2024, 27 (12) : S576 - S576
  • [49] EPIDEMIOLOGICAL DISEASE BURDEN OF OVARIAN ENDOMETRIOSIS BASED ON REAL-WORLD HEALTH INSURANCE CLAIMS DATA
    Csakvari, T.
    Elmer, D.
    Kajos, L.
    Ponusz, R.
    Ponusz-Kovacs, D.
    Kovacs, B.
    Endrei, D.
    Boncz, I
    Bodis, J.
    VALUE IN HEALTH, 2022, 25 (07) : S451 - S451
  • [50] Real-world data of trastuzumab in metastatic cancer
    Vasques, A.
    Baleiras, M.
    Ferreira, A.
    Duarte, T.
    Branco, V.
    Pereira, J.
    Lobo-Martins, S.
    Pinto, M.
    Martins, A.
    ANNALS OF ONCOLOGY, 2022, 33 : S268 - S268