On Anonymizing Medical Microdata with Large-Scale Missing Values - A Case Study with the FAERS Dataset

被引:0
|
作者
Hsiao, Mei-Hui [1 ]
Lin, Wen-Yang [1 ]
Hsu, Kuang-Yung [1 ]
Shen, Zih-Xun [1 ]
机构
[1] Natl Univ Kaohsiung, Dept Comp Sci & Informat Engn, Kaohsiung, Taiwan
关键词
D O I
10.1109/embc.2019.8857025
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
As big data analysis becomes one of the main driving forces for productivity and economic growth, the concern of individual privacy disclosure increases as well, especially for applications accessing medical or health data that contain personal information. Most contemporary techniques for privacy preserving data publishing follow a simple assumption-the data of concern is complete, i.e., containing no missing values, which however is not the case in the real world. This paper presents our endeavors on inspecting the effect of missing values upon medical data privacy. In particular, we inspected the US FAERS dataset, a public dataset containing adverse drug events released by US FDA. Following the presumption of current anonymization paradigm-the data should contain no missing values, we investigated three intuitive strategies, including or excluding missing values or executing imputation, to anonymize the FAERS dataset. Our results demonstrate the awkwardness of these intuitive strategies in handling data with a massive amount of missing values. Accordingly, we propose a new strategy, consolidation, and the corresponding privacy protection model and anonymization algorithm. Experimental results show that our method can prevent privacy disclosure and sustain the data utility for ADR signal detection.
引用
收藏
页码:6505 / 6508
页数:4
相关论文
共 50 条
  • [31] SDFC dataset: a large-scale benchmark dataset for hyperspectral image classification
    Liwei Sun
    Junjie Zhang
    Jia Li
    Yueming Wang
    Dan Zeng
    Optical and Quantum Electronics, 2023, 55
  • [32] The COUGHVID crowdsourcing dataset, a corpus for the study of large-scale cough analysis algorithms
    Lara Orlandic
    Tomas Teijeiro
    David Atienza
    Scientific Data, 8
  • [33] The COUGHVID crowdsourcing dataset, a corpus for the study of large-scale cough analysis algorithms
    Orlandic, Lara
    Teijeiro, Tomas
    Atienza, David
    SCIENTIFIC DATA, 2021, 8 (01)
  • [34] CSS: A Large-scale Cross-schema Chinese Text-to-SQL Medical Dataset
    Zhang, Hanchong
    Li, Jieyu
    Chen, Lu
    Cao, Ruisheng
    Zhang, Yunyan
    Huang, Yu
    Zheng, Yefeng
    Yu, Kai
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 6970 - 6983
  • [35] Segmentation Quality Refinement in Large-Scale Medical Image Dataset with Crowd-Sourced Annotations
    Cychnerski, Jan
    Dziubich, Tomasz
    NEW TRENDS IN DATABASE AND INFORMATION SYSTEMS, ADBIS 2021, 2021, 1450 : 205 - 216
  • [36] Fraud Detection Using Large-scale Imbalance Dataset
    Rubaidi, Zainab Saad
    Ben Ammar, Boulbaba
    Ben Aouicha, Mohamed
    INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2022, 31 (08)
  • [37] A large-scale dataset of in vivo pharmacology assay results
    Fiona M. I. Hunter
    Francis L. Atkinson
    A. Patrícia Bento
    Nicolas Bosc
    Anna Gaulton
    Anne Hersey
    Andrew R. Leach
    Scientific Data, 5
  • [38] A large-scale audit of dataset licensing and attribution in AI
    Longpre, Shayne
    Mahari, Robert
    Chen, Anthony
    Obeng-Marnu, Naana
    Sileo, Damien
    Brannon, William
    Muennighoff, Niklas
    Khazam, Nathan
    Kabbara, Jad
    Perisetla, Kartik
    Wu, Xinyi
    Shippole, Enrico
    Bollacker, Kurt
    Wu, Tongshuang
    Villa, Luis
    Pentland, Sandy
    Hooker, Sara
    NATURE MACHINE INTELLIGENCE, 2024, 6 (08) : 975 - 987
  • [39] KoDF: A Large-scale Korean DeepFake Detection Dataset
    Kwon, Patrick
    You, Jaeseong
    Nam, Gyuhyeon
    Park, Sungwoo
    Chae, Gyeongsu
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 10724 - 10733
  • [40] Modernizing Analytics for Melanoma with a Large-Scale Research Dataset
    Richter, Aaron N.
    Khoshgoftaar, Taghi M.
    2017 IEEE 18TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IEEE IRI 2017), 2017, : 551 - 558