Exploring the Complexity of Real-World Health Data Record Linkage-An Exemplary Study Linking Cancer Registry and Claims Data

被引:0
|
作者
Lendle, Nadja [1 ]
Kollhorst, Bianca [1 ]
Intemann, Timm [1 ]
机构
[1] Leibniz Inst Prevent Res & Epidemiol BIPS, Dept Biometry & Data Management, Bremen, Germany
关键词
administrative healthcare database; data linkage; deterministic linkage; gradient boosting; pharmacoepidemiology; quasi-identifiers;
D O I
10.1002/pds.70120
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
PurposeRecord linkage based on quasi-identifiers remains an important approach as not every data source provides a comprehensive unique identifier. In this study, the reasons for the failure of a linkage based on quasi-identifiers were examined. Furthermore, informed algorithms using information on gold standard links were developed to investigate the potentially achievable linkage quality based on quasi-identifiers.MethodsThe study population includes patients from an antidiabetic cohort from German claims and colorectal cancer patients from two German cancer registries. Linkage algorithms were applied using information on gold standard links. Informed linkage algorithms based on deterministic linkage, logistic regression, random forests, gradient boosting, and neural networks were derived and compared. Descriptive analyses were performed to identify reasons for the failure of linkage, such as discrepancies between data sources.ResultsA gradient boosting-based linkage approach performed best, achieving a precision (positive predictive value) of 77%, a recall (sensitivity) of 81%, and an F*-measure (combining precision and recall) of 64%. Of 641 patients in GePaRD, 8% were not uniquely identifiable using birth year, sex, area of residence, and year and quarter of diagnosis, whereas 33% of 42 817 cancer registry patients were not uniquely identifiable with these quasi-identifiers.ConclusionsLinkage of German claims and cancer registry data based on quasi-identifiers does result in insufficient linkage quality since subjects cannot be uniquely identified. It is advisable to use unique identifiers from a subsample, if available, to derive informed linkage algorithms for the entire sample. In this case, the machine learning technique gradient boosting has been found to outperform other methods.
引用
收藏
页数:8
相关论文
共 50 条
  • [31] Data collection framework for electronic medical record-based real-world data to evaluate the effectiveness and safety of cancer drugs: a nationwide real-world study of the Korean Cancer Study Group (KCSG).
    Han, Hye Sook
    Lee, Kyoung Eun
    Suh, Young Ju
    Jee, Hee-Jung
    Kim, Bum Jun
    Kim, Hyeong Su
    Lee, Keun-Wook
    Ryu, Min-Hee
    Baek, Sun Kyung
    Park, In Hae
    Ahn, Hee Kyung
    Jeong, Jae Ho
    Kim, Min Hwan
    Byun, Ji-Hye
    Kim, Dong Sook
    An, Hyonggin
    Park, Yeon Hee
    Zang, Dae Young
    JOURNAL OF CLINICAL ONCOLOGY, 2022, 40 (16) : E18759 - E18759
  • [32] Linkage of Data of the Medical Service of the Health Insurances with Data of a Cancer Registry
    Breckenkamp, J.
    Spallek, J.
    Kraywinkel, K.
    Krieg, V.
    Schwabe, W.
    Greiner, W.
    Damm, O.
    Hense, H. W.
    Razum, O.
    GESUNDHEITSWESEN, 2012, 74 (07) : E52 - E59
  • [33] HOW TO MEASURE 'OPIOID RELAPSE' IN REAL-WORLD CLAIMS DATA
    Montejano, L. B.
    Ronquest, N. A.
    Willson, T. M.
    Wollschlaeger, B. A.
    Cole, A. L.
    Nadipelli, V. R.
    VALUE IN HEALTH, 2016, 19 (03) : A72 - A72
  • [34] Breast cancer in Twitter: A real-world data exploratory study
    Bayona, R. Sanchez
    Chang-Azancot, L.
    Alvarez de Mon, M. A.
    Llavero, M.
    Vallejo, M.
    Gardeazabal, I.
    Salas, D.
    Sala Elarre, P.
    Baraibar Argota, I.
    Eguren, I.
    Santisteban Eslava, M.
    Ceniceros, L.
    Castanon Alvarez, E.
    ANNALS OF ONCOLOGY, 2018, 29
  • [35] Exploring Breast Cancer Systemic Drug Therapy Patterns in Real-World Data
    O'Rourke, Julia
    Warnick, Jeff
    Doole, John
    De Keyser, Luc
    Drebert, Zuzanna
    Wan, Olivia
    Thompson, Courtney N.
    London, Jack W.
    Fairchild, Karen
    Palchuk, Matvey B.
    JCO CLINICAL CANCER INFORMATICS, 2023, 7
  • [36] Exploring Breast Cancer Systemic Drug Therapy Patterns in Real-World Data
    O'Rourke, Julia
    Warnick, Jeff
    Doole, John
    De Keyser, Luc
    Drebert, Zuzanna
    Wan, Olivia
    Thompson, Courtney N.
    London, Jack W.
    Fairchild, Karen
    Palchuk, Matvey B.
    JCO CLINICAL CANCER INFORMATICS, 2023, 7 : e2300061
  • [37] USING REAL-WORLD ELECTRONIC HEALTH RECORD DATA TO PREDICT POSITIVE BLOOD CULTURE
    Beyhaghi, Hadi
    Zimmer, Louise
    Thiel, Ellen
    Zimmerman, Nicole
    Toback, Seth
    Miller, Mark
    MEDICAL DECISION MAKING, 2020, 40 (01) : E159 - E160
  • [38] Real-world integration of genomic data into the electronic health record: the PennChart Genomics Initiative
    Lau-Min, Kelsey S.
    Asher, Stephanie Byers
    Chen, Jessica
    Domchek, Susan M.
    Feldman, Michael
    Joffe, Steven
    Landgraf, Jeffrey
    Speare, Virginia
    Varughese, Lisa A.
    Tuteja, Sony
    VanZandbergen, Christine
    Ritchie, Marylyn D.
    Nathanson, Katherine L.
    GENETICS IN MEDICINE, 2021, 23 (04) : 603 - 605
  • [39] Integrated electronic health record tools to access real-world data in oncology research
    Casagni, Michelle
    Llewellyn, Nicole
    Kokolus, Maeve
    Chan, Miranda
    Dingwell, Robert
    Chow, Selina
    Campbell, Nancy
    Elrahi, Cassandra
    Piantadosi, Steven
    Quina, Andre
    JAMIA OPEN, 2024, 7 (04)
  • [40] Enhancing clinical trials by linking them to real-world data
    NajafZadeh, Mehdi
    Shimamura, Akiko
    Ben McConnochie
    Polewaczyk, Jimmy
    Demirci, Sevtap
    Talwai, Aniketh
    Ahmed, Amir
    PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2023, 32 : 111 - 111