Exploring the Complexity of Real-World Health Data Record Linkage-An Exemplary Study Linking Cancer Registry and Claims Data

被引:0
|
作者
Lendle, Nadja [1 ]
Kollhorst, Bianca [1 ]
Intemann, Timm [1 ]
机构
[1] Leibniz Inst Prevent Res & Epidemiol BIPS, Dept Biometry & Data Management, Bremen, Germany
关键词
administrative healthcare database; data linkage; deterministic linkage; gradient boosting; pharmacoepidemiology; quasi-identifiers;
D O I
10.1002/pds.70120
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
PurposeRecord linkage based on quasi-identifiers remains an important approach as not every data source provides a comprehensive unique identifier. In this study, the reasons for the failure of a linkage based on quasi-identifiers were examined. Furthermore, informed algorithms using information on gold standard links were developed to investigate the potentially achievable linkage quality based on quasi-identifiers.MethodsThe study population includes patients from an antidiabetic cohort from German claims and colorectal cancer patients from two German cancer registries. Linkage algorithms were applied using information on gold standard links. Informed linkage algorithms based on deterministic linkage, logistic regression, random forests, gradient boosting, and neural networks were derived and compared. Descriptive analyses were performed to identify reasons for the failure of linkage, such as discrepancies between data sources.ResultsA gradient boosting-based linkage approach performed best, achieving a precision (positive predictive value) of 77%, a recall (sensitivity) of 81%, and an F*-measure (combining precision and recall) of 64%. Of 641 patients in GePaRD, 8% were not uniquely identifiable using birth year, sex, area of residence, and year and quarter of diagnosis, whereas 33% of 42 817 cancer registry patients were not uniquely identifiable with these quasi-identifiers.ConclusionsLinkage of German claims and cancer registry data based on quasi-identifiers does result in insufficient linkage quality since subjects cannot be uniquely identified. It is advisable to use unique identifiers from a subsample, if available, to derive informed linkage algorithms for the entire sample. In this case, the machine learning technique gradient boosting has been found to outperform other methods.
引用
收藏
页数:8
相关论文
共 50 条
  • [21] THE ADDED VALUE OF LINKING REAL-WORLD DATA SOURCES
    Raad, H.
    Tagliabue, S.
    Nasanbat, E.
    Hamid, A.
    Furegato, M.
    Medina, P.
    VALUE IN HEALTH, 2024, 27 (06) : S366 - S366
  • [22] DEVELOPMENT OF A LONGITUDINAL PROSTATE CANCER TRANSCRIPTOMIC AND REAL-WORLD CLINICAL DATA LINKAGE
    Leapman, Michael
    Ho, Julian
    Liu, Yang
    Zhao, Xin
    Hakansson, Alexander
    Proudfoot, James
    Davicioni, Elai
    Martin, Darryl
    An, Yi
    Seibert, Tyler
    Spratt, Daniel
    Cooperberg, Matthew
    Ross, Ashley
    Sprenkle, Preston
    JOURNAL OF UROLOGY, 2023, 209 : E216 - E216
  • [23] Record Linkage of secondary data in the NAKO: Overview of data sources with focus on health insurance claims data
    Ahrens, W.
    Pigeot, I.
    Stallmann, C.
    Swart, E.
    EUROPEAN JOURNAL OF PUBLIC HEALTH, 2023, 33
  • [24] REGISTRY: ITS USE IN REAL-WORLD DATA COLLECTION
    Pan, Y., I
    Dieck, G.
    Stemhagen, A.
    VALUE IN HEALTH, 2014, 17 (03) : A203 - A203
  • [25] Registry: Its Use in Real-World Data Collection
    Pan, Irene
    Dieck, Gretchen
    Stemhagen, Annette
    PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2014, 23 : 124 - 125
  • [26] STENT registry: Real-world US stent data
    Block, Peter C.
    Simonton, Charles A.
    JOURNAL OF THE AMERICAN COLLEGE OF CARDIOLOGY, 2006, 48 (09) : CS1 - CS6
  • [27] Epidemiology of AL amyloidosis: a real-world study using US claims data
    Quock, Tiffany P.
    Yan, Tingjian
    Chang, Eunice
    Guthrie, Spencer
    Broder, Michael S.
    BLOOD ADVANCES, 2018, 2 (10) : 1046 - 1053
  • [28] Efficiently Measuring Complexity on the Basis of Real-World Data
    Unakafova, Valentina A.
    Keller, Karsten
    ENTROPY, 2013, 15 (10) : 4392 - 4415
  • [29] Identifying Smoking Status and Smoking Cessation Using a Data Linkage Between the Kentucky Cancer Registry and Health Claims Data
    Gallaway, Michael Shayne
    Huang, Bin
    Chen, Quan
    Tucker, Tom
    McDowell, Jaclyn
    Durbin, Eric
    Siegel, David
    Tai, Eric
    JCO CLINICAL CANCER INFORMATICS, 2019, 3 : 1 - 8
  • [30] Editorial: Real-world data and real-world evidence in lung cancer
    Gristina, Valerio
    Eze, Chukwuka
    FRONTIERS IN ONCOLOGY, 2024, 14