On Anonymizing Medical Microdata with Large-Scale Missing Values - A Case Study with the FAERS Dataset

被引:0
|
作者
Hsiao, Mei-Hui [1 ]
Lin, Wen-Yang [1 ]
Hsu, Kuang-Yung [1 ]
Shen, Zih-Xun [1 ]
机构
[1] Natl Univ Kaohsiung, Dept Comp Sci & Informat Engn, Kaohsiung, Taiwan
关键词
D O I
10.1109/embc.2019.8857025
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
As big data analysis becomes one of the main driving forces for productivity and economic growth, the concern of individual privacy disclosure increases as well, especially for applications accessing medical or health data that contain personal information. Most contemporary techniques for privacy preserving data publishing follow a simple assumption-the data of concern is complete, i.e., containing no missing values, which however is not the case in the real world. This paper presents our endeavors on inspecting the effect of missing values upon medical data privacy. In particular, we inspected the US FAERS dataset, a public dataset containing adverse drug events released by US FDA. Following the presumption of current anonymization paradigm-the data should contain no missing values, we investigated three intuitive strategies, including or excluding missing values or executing imputation, to anonymize the FAERS dataset. Our results demonstrate the awkwardness of these intuitive strategies in handling data with a massive amount of missing values. Accordingly, we propose a new strategy, consolidation, and the corresponding privacy protection model and anonymization algorithm. Experimental results show that our method can prevent privacy disclosure and sustain the data utility for ADR signal detection.
引用
收藏
页码:6505 / 6508
页数:4
相关论文
共 50 条
  • [41] CStory: A Chinese Large-scale News Storyline Dataset
    Shi, Kaijie
    Wang, Xiaozhi
    Yu, Jifan
    Hou, Lei
    Li, Juanzi
    Wu, Jingtong
    Yong, Dingyu
    Xiao, Jinghui
    Liu, Qun
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 4475 - 4479
  • [42] BIGPATENT: A Large-Scale Dataset for Abstractive and Coherent Summarization
    Sharma, Eva
    Li, Chen
    Wang, Lu
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 2204 - 2213
  • [43] USED: A Large-scale Social Event Detection Dataset
    Ahmad, Kashif
    Conci, Nicola
    Boato, Giulia
    De Natale, Francesco G. B.
    PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON MULTIMEDIA SYSTEMS (MMSYS'16), 2016, : 380 - 385
  • [44] Large-scale Cloze Test Dataset Created by Teachers
    Xie, Qizhe
    Lai, Guokun
    Dai, Zihang
    Hovy, Eduard
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 2344 - 2356
  • [45] VGGSOUND: A LARGE-SCALE AUDIO-VISUAL DATASET
    Chen, Honglie
    Xie, Weidi
    Vedaldi, Andrea
    Zisserman, Andrew
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 721 - 725
  • [46] WAID: A Large-Scale Dataset for Wildlife Detection with Drones
    Mou, Chao
    Liu, Tengfei
    Zhu, Chengcheng
    Cui, Xiaohui
    APPLIED SCIENCES-BASEL, 2023, 13 (18):
  • [47] PetFace: A Large-Scale Dataset and Benchmark for Animal Identification
    Shinoda, Risa
    Shiohara, Kaede
    COMPUTER VISION-ECCV 2024, PT XVIII, 2025, 15076 : 19 - 36
  • [48] Nostalgia on Twitter: Detection and Analysis of a Large-Scale Dataset
    Stanley Jothiraj, Fiona Victoria
    Hong, Lingzi
    Mashhadi, Afra
    Proceedings of the Association for Information Science and Technology, 2024, 61 (01) : 349 - 360
  • [49] MARVEL: A Large-Scale Image Dataset for Maritime Vessels
    Gundogdu, Erhan
    Solmaz, Berkan
    Yucesoy, Veysel
    Koc, Aykut
    COMPUTER VISION - ACCV 2016, PT V, 2017, 10115 : 165 - 180
  • [50] MobileRec: A Large-Scale Dataset for Mobile Apps Recommendation
    Maqbool, M. H.
    Farooq, Umar
    Mosharrof, Adib
    Siddique, A. B.
    Foroosh, Hassan
    PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023, 2023, : 3007 - 3016