Automatic De-Identification of Medical Records with a Multilevel Hybrid Semi-Supervised Learning Approach

被引:0
|
作者
Nguyen Dong Phuong [1 ]
Vo Thi Ngoc Chau [2 ]
机构
[1] Vietnam Natl Univ, Ho Chi Minh City Univ Technol, Ton Duc Thang Univ, Ctr Appl Informat Technol, Ho Chi Minh City, Vietnam
[2] Vietnam Natl Univ, Ho Chi Minh city Univ Technol, Fac Comp Sci & Engn, Dept Informat Syst, Ho Chi Minh City, Vietnam
关键词
de-identijication; protected health information; electronic medical record; privacy preserving; multilevel hybrid semi-supervised learning; CLINICAL DOCUMENTS; SYSTEM;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, sharing electronic medical records (EMRs) for more researchers outside the associated institutions is significant. For privacy preservation of the corresponding patients and the associated institutions, a de-identification task on the EMRs to be shared is a must. Although the de-identification task has been considered with positive research outcomes worldwide, especially those from the i2b2 (Informatics for Integrating Biology and the Bedside) shared tasks in 2006 and 2014, the task has not yet been a solved problem and still needs more investigation realistically. In this paper, we propose an automatic de-identification solution in a multilevel hybrid semi-supervised learning paradigm with a key focus on correctly identifying protected health information (PHI) in the EMRs. Similar to the existing works, our work defines a hybrid approach by combining a machine learning-based method with a conditional random fields model and a rule-based method in a post-processing phase to handle the PHI types with disambiguity. Nevertheless, our work is more general and practical. First, it considers the structure complexity of each EMR so that each section can be treated properly for more correct PHI identification up to its structure complexity: structured, semi-structured, or un-structured. Second, each EMR is then examined in our approach at three different levels of granularity such as a token level in the supervised learning phase, an entity level in the rule-based post-processing phase, and a section level along with the structure complexity in the semi-supervised learning phase. Many various detail levels will give our approach a deeper look at each EMR for more effectiveness. Third, our solution is conducted in a self-training manner so that it can get started with a small annotated data set in practice and get more effective with new EMRs over time. Evaluated with the i2b2 data set in comparison with the related works, our solution is effective with better F-measure values for the AGE, LOCATION, and PHONE PHI types and comparable for the other PHI types.
引用
收藏
页码:43 / 48
页数:6
相关论文
共 50 条
  • [21] Enhanced semi-supervised learning for automatic video annotation
    Wang, Meng
    Hua, Xian-Sheng
    Dai, Li-Rong
    Song, Yan
    2006 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO - ICME 2006, VOLS 1-5, PROCEEDINGS, 2006, : 1485 - +
  • [22] Leveraging text skeleton for de-identification of electronic medical records
    Zhao, Yue-Shu
    Zhang, Kun-Li
    Ma, Hong-Chao
    Li, Kun
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2018, 18
  • [23] Leveraging text skeleton for de-identification of electronic medical records
    Yue-Shu Zhao
    Kun-Li Zhang
    Hong-Chao Ma
    Kun Li
    BMC Medical Informatics and Decision Making, 18
  • [24] Automated de-identification of free-text medical records
    Neamatullah, Ishna
    Douglass, Margaret M.
    Lehman, Li-wei H.
    Reisner, Andrew
    Villarroel, Mauricio
    Long, William J.
    Szolovits, Peter
    Moody, George B.
    Mark, Roger G.
    Clifford, Gari D.
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2008, 8 (1)
  • [25] Automated de-identification of free-text medical records
    Ishna Neamatullah
    Margaret M Douglass
    Li-wei H Lehman
    Andrew Reisner
    Mauricio Villarroel
    William J Long
    Peter Szolovits
    George B Moody
    Roger G Mark
    Gari D Clifford
    BMC Medical Informatics and Decision Making, 8
  • [26] An artificial life approach for semi-supervised learning
    Herrmann, Lutz
    Ultsch, Alfred
    DATA ANALYSIS, MACHINE LEARNING AND APPLICATIONS, 2008, : 139 - 146
  • [27] A Semi-Supervised Learning Approach To Differential Privacy
    Jagannathan, Geetha
    Monteleoni, Claire
    Pillaipakkamnatt, Krishnan
    2013 IEEE 13TH INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW), 2013, : 841 - 848
  • [28] MixMatch: A Holistic Approach to Semi-Supervised Learning
    Berthelot, David
    Carlini, Nicholas
    Goodfellow, Ian
    Oliver, Avital
    Papernot, Nicolas
    Raffel, Colin
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [29] Adversarial de-overlapping learning machines for supervised and semi-supervised learning
    Sun, Yichen
    Vong, Chi Man
    Wang, Shitong
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2024, : 2249 - 2267
  • [30] Credence-Net: a semi-supervised deep learning approach for medical images
    Mall, Pawan Kumar
    Singh, Pradeep Kumar
    INTERNATIONAL JOURNAL OF NANOTECHNOLOGY, 2023, 20 (5-10) : 897 - 914