Proposal and Assessment of a De-Identification Strategy to Enhance Anonymity of the Observational Medical Outcomes Partnership Common Data Model (OMOP-CDM) in a Public Cloud-Computing Environment: Anonymization of Medical Data Using Privacy Models

被引:9
|
作者
Jeon, Seungho [1 ]
Seo, Jeongeun [1 ]
Kim, Sukyoung [1 ]
Lee, Jeongmoon [2 ]
Kim, Jong-Ho [3 ]
Sohn, Jang Wook [4 ]
Moon, Jongsub [1 ]
Joo, Hyung Joon [5 ]
机构
[1] Korea Univ, Grad Sch Informat Secur, Div Informat Secur, Seoul, South Korea
[2] Korea Univ, Res Inst Med Bigdata Sci, Seoul, South Korea
[3] Korea Univ, Cardiovasc Ctr, Dept Cardiol, Seoul, South Korea
[4] Korea Univ, Coll Med, Dept Internal Med, Div Infect Dis, Seoul, South Korea
[5] Korea Univ, Coll Med, Dept Internal Med, 145 Anam Ro, Seoul 02841, South Korea
关键词
de-identification; privacy; anonymization; common data model; Observational Health Data Sciences and Informatics;
D O I
10.2196/19597
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Background: De-identifying personal information is critical when using personal health data for secondary research. The Observational Medical Outcomes Partnership Common Data Model (CDM), defined by the nonprofit organization Observational Health Data Sciences and Informatics, has been gaining attention for its use in the analysis of patient-level clinical data obtained from various medical institutions. When analyzing such data in a public environment such as a cloud-computing system, an appropriate de-identification strategy is required to protect patient privacy. Objective: This study proposes and evaluates a de-identification strategy that is comprised of several rules along with privacy models such as k-anonymity, 1-diversity, and t-closeness. The proposed strategy was evaluated using the actual CDM database. Methods: The CDM database used in this study was constructed by the Anam Hospital of Korea University. Analysis and evaluation were performed using the ARX anonymizing framework in combination with the k-anonymity, 1-diversity, and t-closeness privacy models. Results: The CDM database, which was constructed according to the rules established by Observational Health Data Sciences and Informatics, exhibited a low risk of re-identification: The highest re-identifiable record rate (11.3%) in the dataset was exhibited by the DRUG_EXPOSURE table, with a re-identification success rate of 0.03%. However, because all tables include at least one "highest risk" value of 100%, suitable anonymizing techniques are required; moreover, the CDM database preserves the "source values" (raw data), a combination of which could increase the risk of re-identification. Therefore, this study proposes an enhanced strategy to de-identify the source values to significantly reduce not only the highest risk in the k-anonymity, 1-diversity, and t-closeness privacy models but also the overall possibility of re-identification. Conclusions: Our proposed de-identification strategy effectively enhanced the privacy of the CDM database, thereby encouraging clinical research involving multiple centers.
引用
下载
收藏
页数:14
相关论文
共 4 条
  • [1] MI-Common Data Model: Extending Observational Medical Outcomes Partnership-Common Data Model (OMOP-CDM) for Registering Medical Imaging Metadata and Subsequent Curation Processes
    Kalokyri, Varvara
    Kondylakis, Haridimos
    Sfakianakis, Stelios
    Nikiforaki, Katerina
    Karatzanis, Ioannis
    Mazzetti, Simone
    Tachos, Nikolaos
    Regge, Daniele
    Fotiadis, Dimitrios I.
    Marias, Konstantinos
    Tsiknakis, Manolis
    JCO CLINICAL CANCER INFORMATICS, 2023, 7 : e2300101
  • [2] Conversion of National Health Insurance Service-National Sample Cohort (NHIS-NSC) Database into Observational Medical Outcomes Partnership-Common Data Model (OMOP-CDM)
    You, Seng Chan
    Lee, Seongwon
    Cho, Soo-Yeon
    Park, Hojun
    Jung, Sungjae
    Cho, Jaehyeong
    Yoon, Dukyong
    Park, Rae Woong
    MEDINFO 2017: PRECISION HEALTHCARE THROUGH INFORMATICS, 2017, 245 : 467 - 470
  • [3] Identification of patients with drug-resistant epilepsy in electronic medical record data using the Observational Medical Outcomes Partnership Common Data Model
    Castano, Victor G.
    Spotnitz, Matthew
    Waldman, Genna J.
    Joiner, Evan F.
    Choi, Hyunmi
    Ostropolets, Anna
    Natarajan, Karthik
    McKhann, Guy M.
    Ottman, Ruth
    Neugut, Alfred, I
    Hripcsak, George
    Youngerman, Brett E.
    EPILEPSIA, 2022, 63 (11) : 2981 - 2993
  • [4] recruIT: A cloud-native clinical trial recruitment support system based on Health Level 7 Fast Healthcare Interoperability Resources (HL7 FHIR) and the Observational Medical Outcomes Partnership Common Data Model (OMOP CDM)
    Gulden C.
    Macho P.
    Reinecke I.
    Strantz C.
    Prokosch H.-U.
    Blasini R.
    Computers in Biology and Medicine, 2024, 174