On Anonymizing Medical Microdata with Large-Scale Missing Values - A Case Study with the FAERS Dataset

被引:0
|
作者
Hsiao, Mei-Hui [1 ]
Lin, Wen-Yang [1 ]
Hsu, Kuang-Yung [1 ]
Shen, Zih-Xun [1 ]
机构
[1] Natl Univ Kaohsiung, Dept Comp Sci & Informat Engn, Kaohsiung, Taiwan
关键词
D O I
10.1109/embc.2019.8857025
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
As big data analysis becomes one of the main driving forces for productivity and economic growth, the concern of individual privacy disclosure increases as well, especially for applications accessing medical or health data that contain personal information. Most contemporary techniques for privacy preserving data publishing follow a simple assumption-the data of concern is complete, i.e., containing no missing values, which however is not the case in the real world. This paper presents our endeavors on inspecting the effect of missing values upon medical data privacy. In particular, we inspected the US FAERS dataset, a public dataset containing adverse drug events released by US FDA. Following the presumption of current anonymization paradigm-the data should contain no missing values, we investigated three intuitive strategies, including or excluding missing values or executing imputation, to anonymize the FAERS dataset. Our results demonstrate the awkwardness of these intuitive strategies in handling data with a massive amount of missing values. Accordingly, we propose a new strategy, consolidation, and the corresponding privacy protection model and anonymization algorithm. Experimental results show that our method can prevent privacy disclosure and sustain the data utility for ADR signal detection.
引用
收藏
页码:6505 / 6508
页数:4
相关论文
共 50 条
  • [1] A Parallel Algorithm for Anonymizing Large-scale Trajectory Data
    Ward, Katrina
    Lin, Dan
    Madria, Sanjay
    1600, Association for Computing Machinery, 2 Penn Plaza, Suite 701, New York, NY 10121-0701, United States (01):
  • [2] Development of a large-scale medical visual question-answering dataset
    Zhang, Xiaoman
    Wu, Chaoyi
    Zhao, Ziheng
    Lin, Weixiong
    Zhang, Ya
    Wang, Yanfeng
    Xie, Weidi
    COMMUNICATIONS MEDICINE, 2024, 4 (01):
  • [3] Dealing with missing values in large-scale studies: microarray data imputation and beyond
    Aittokallio, Tero
    BRIEFINGS IN BIOINFORMATICS, 2010, 11 (02) : 253 - 264
  • [4] DMDD: A Large-Scale Dataset for Dataset Mentions Detection
    Pan, Huitong
    Zhang, Qi
    Dragut, Eduard
    Caragea, Cornelia
    Latecki, Longin Jan
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2023, 11 : 1132 - 1146
  • [5] Missing Data Recovery in Large-scale, Sparse Datacenter Traces: An Alibaba Case Study
    Liang, Yi
    Bi, Linfeng
    Su, Xing
    2019 19TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2019, : 251 - 261
  • [6] Large-scale RDF Dataset Slicing
    Marx, Edgard
    Shekarpour, Saeedeh
    Auer, Soeren
    Ngomo, Axel-Cyrille Ngonga
    2013 IEEE SEVENTH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC 2013), 2013, : 228 - 235
  • [7] Euler Clustering on Large-scale Dataset
    Wu, Jian-Sheng
    Zheng, Wei-Shi
    Lai, Jian-Huang
    Suen, Ching Y.
    IEEE TRANSACTIONS ON BIG DATA, 2018, 4 (04) : 502 - 515
  • [8] MICRODATA FILE MERGING THROUGH LARGE-SCALE NETWORK TECHNOLOGY
    BARR, RS
    TURNER, JS
    MATHEMATICAL PROGRAMMING STUDY, 1981, 15 (MAY): : 1 - 22
  • [9] The Jester Dataset: A Large-Scale Video Dataset of Human Gestures
    Materzynska, Joanna
    Berger, Guillaume
    Bax, Ingo
    Memisevic, Roland
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 2874 - 2882
  • [10] MIND: A Large-scale Dataset for News Recommendation
    Wu, Fangzhao
    Qiao, Ying
    Chen, Jiun-Hung
    Wu, Chuhan
    Qi, Tao
    Lian, Jianxun
    Liu, Danyang
    Xie, Xing
    Gao, Jianfeng
    Wu, Winnie
    Zhou, Ming
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 3597 - 3606