Exploring the Complexity of Real-World Health Data Record Linkage-An Exemplary Study Linking Cancer Registry and Claims Data

被引:0
|
作者
Lendle, Nadja [1 ]
Kollhorst, Bianca [1 ]
Intemann, Timm [1 ]
机构
[1] Leibniz Inst Prevent Res & Epidemiol BIPS, Dept Biometry & Data Management, Bremen, Germany
关键词
administrative healthcare database; data linkage; deterministic linkage; gradient boosting; pharmacoepidemiology; quasi-identifiers;
D O I
10.1002/pds.70120
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
PurposeRecord linkage based on quasi-identifiers remains an important approach as not every data source provides a comprehensive unique identifier. In this study, the reasons for the failure of a linkage based on quasi-identifiers were examined. Furthermore, informed algorithms using information on gold standard links were developed to investigate the potentially achievable linkage quality based on quasi-identifiers.MethodsThe study population includes patients from an antidiabetic cohort from German claims and colorectal cancer patients from two German cancer registries. Linkage algorithms were applied using information on gold standard links. Informed linkage algorithms based on deterministic linkage, logistic regression, random forests, gradient boosting, and neural networks were derived and compared. Descriptive analyses were performed to identify reasons for the failure of linkage, such as discrepancies between data sources.ResultsA gradient boosting-based linkage approach performed best, achieving a precision (positive predictive value) of 77%, a recall (sensitivity) of 81%, and an F*-measure (combining precision and recall) of 64%. Of 641 patients in GePaRD, 8% were not uniquely identifiable using birth year, sex, area of residence, and year and quarter of diagnosis, whereas 33% of 42 817 cancer registry patients were not uniquely identifiable with these quasi-identifiers.ConclusionsLinkage of German claims and cancer registry data based on quasi-identifiers does result in insufficient linkage quality since subjects cannot be uniquely identified. It is advisable to use unique identifiers from a subsample, if available, to derive informed linkage algorithms for the entire sample. In this case, the machine learning technique gradient boosting has been found to outperform other methods.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] REAL-WORLD PROBLEMS WITH REAL-WORLD DATA: ADDRESSING DATA QUALITY IN THE ELECTRONIC HEALTH RECORD
    Anderson, Wesley
    Boyce, Danielle
    Kurtycz, Ruth
    Roddy, Will
    Heavner, Smith
    CRITICAL CARE MEDICINE, 2024, 52
  • [2] A comparison of record linkage software and algorithms using real-world data
    West, Suzanne L.
    Karr, Alan
    Taylor, Matthew T.
    Setoguchi, Soko
    Kou, Doug
    Gerhard, Tobias
    Horton, Daniel B.
    PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2018, 27 : 90 - 90
  • [3] Cost of breast cancer based on real-world data: a cancer registry study in Italy
    Stefano Capri
    Antonio Russo
    BMC Health Services Research, 17
  • [4] Cost of breast cancer based on real-world data: a cancer registry study in Italy
    Capri, Stefano
    Russo, Antonio
    BMC HEALTH SERVICES RESEARCH, 2017, 17
  • [5] Comparing record linkage software programs and algorithms using real-world data
    Karr, Alan F.
    Taylor, Matthew T.
    West, Suzanne L.
    Setoguchi, Soko
    Kou, Tzuyung D.
    Gerhard, Tobias
    Horton, Daniel B.
    PLOS ONE, 2019, 14 (09):
  • [6] Validity of Deterministic Record Linkage Using Multiple Indirect Personal Identifiers Linking a Large Registry to Claims Data
    Setoguchi, Soko
    Zhu, Ying
    Jalbert, Jessica J.
    Williams, Lauren A.
    Chen, Chih-Ying
    CIRCULATION-CARDIOVASCULAR QUALITY AND OUTCOMES, 2014, 7 (03): : 475 - 480
  • [7] Alzheimer's Disease Linkage to Real-World Evidence (AD-LINE) Study: Linking Claims Data to Phase 3 GRADUATE Study of Gantenerumab
    Fillit, H.
    Assuncao, S. Seleri
    Majda, Thomas
    Ng, C. D.
    To, T. M.
    Abbass, I. M.
    Raimundo, K.
    Wallick, C.
    Tcheremissine, O. V.
    JPAD-JOURNAL OF PREVENTION OF ALZHEIMERS DISEASE, 2024, 11 (05): : 1251 - 1259
  • [8] Identification of cancer patients using claims data from health insurance systems: A real-world comparative study
    Hongrui Tian
    Ruiping Xu
    Fenglei Li
    Chuanhai Guo
    Lixin Zhang
    Zhen Liu
    Mengfei Liu
    Yaqi Pan
    Zhonghu He
    Yang Ke
    Chinese Journal of Cancer Research, 2019, 31 (04) : 699 - 706
  • [9] Identification of cancer patients using claims data from health insurance systems: A real-world comparative study
    Tian, Hongrui
    Xu, Ruiping
    Li, Fenglei
    Guo, Chuanhai
    Zhang, Lixin
    Liu, Zhen
    Liu, Mengfei
    Pan, Yaqi
    He, Zhonghu
    Ke, Yang
    CHINESE JOURNAL OF CANCER RESEARCH, 2019, 31 (04) : 699 - +
  • [10] From real-world electronic health record data to real-world results using artificial intelligence
    Knevel, Rachel
    Liao, Katherine P.
    ANNALS OF THE RHEUMATIC DISEASES, 2023, 82 (03) : 306 - 311