Automatic de-identification of textual documents in the electronic health record: a review of recent research

被引:182
|
作者
Meystre, Stephane M. [1 ]
Friedlin, F. Jeffrey [3 ]
South, Brett R. [1 ,2 ]
Shen, Shuying [1 ,2 ]
Samore, Matthew H. [1 ,2 ]
机构
[1] Univ Utah, Dept Biomed Informat, Salt Lake City, UT 84112 USA
[2] IDEAS Ctr SLCVA Healthcare Syst, Salt Lake City, UT USA
[3] Regenstrief Inst Inc, Med Informat, Indianapolis, IN USA
来源
关键词
OF-THE-ART; MEDICAL-RECORDS; CLINICAL DOCUMENTS;
D O I
10.1186/1471-2288-10-70
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Background: In the United States, the Health Insurance Portability and Accountability Act (HIPAA) protects the confidentiality of patient data and requires the informed consent of the patient and approval of the Internal Review Board to use data for research purposes, but these requirements can be waived if data is de-identified. For clinical data to be considered de-identified, the HIPAA "Safe Harbor" technique requires 18 data elements (called PHI: Protected Health Information) to be removed. The de-identification of narrative text documents is often realized manually, and requires significant resources. Well aware of these issues, several authors have investigated automated de-identification of narrative text documents from the electronic health record, and a review of recent research in this domain is presented here. Methods: This review focuses on recently published research (after 1995), and includes relevant publications from bibliographic queries in PubMed, conference proceedings, the ACM Digital Library, and interesting publications referenced in already included papers. Results: The literature search returned more than 200 publications. The majority focused only on structured data de-identification instead of narrative text, on image de-identification, or described manual de-identification, and were therefore excluded. Finally, 18 publications describing automated text de-identification were selected for detailed analysis of the architecture and methods used, the types of PHI detected and removed, the external resources used, and the types of clinical documents targeted. All text de-identification systems aimed to identify and remove person names, and many included other types of PHI. Most systems used only one or two specific clinical document types, and were mostly based on two different groups of methodologies: pattern matching and machine learning. Many systems combined both approaches for different types of PHI, but the majority relied only on pattern matching, rules, and dictionaries. Conclusions: In general, methods based on dictionaries performed better with PHI that is rarely mentioned in clinical text, but are more difficult to generalize. Methods based on machine learning tend to perform better, especially with PHI that is not mentioned in the dictionaries used. Finally, the issues of anonymization, sufficient performance, and "over-scrubbing" are discussed in this publication.
引用
收藏
页数:16
相关论文
共 50 条
  • [41] An Automatic System to Detect and Extract Text in Medical Images for De-identification
    Zhu, Yingxuan
    Singh, P. D.
    Siddiqui, Khan
    Gillam, Michael
    MEDICAL IMAGING 2010: ADVANCED PACS-BASED IMAGING INFORMATICS AND THERAPEUTIC APPLICATIONS, 2010, 7628
  • [42] De-Identification of Textual Data using Immune System for Privacy Preserving in Big Data
    Rahmani, Amine
    Amine, Abdelmalek
    Hamou, Mohamed Reda
    2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMMUNICATION TECHNOLOGY CICT 2015, 2015, : 112 - 116
  • [43] Automatic Clinical Text De-Identification: Is It Worth It, and Could It Work for Me?
    Meystre, Stephane M.
    Dalianis, Hercules
    Aberdeen, John
    Malin, Brad
    MEDINFO 2013: PROCEEDINGS OF THE 14TH WORLD CONGRESS ON MEDICAL AND HEALTH INFORMATICS, PTS 1 AND 2, 2013, 192 : 1242 - 1242
  • [44] A Novel Approach on the Joint De-Identification of Textual and Relational Data with a Modified Mondrian Algorithm
    Singhofer, F.
    Garifullina, A.
    Kern, M.
    Scherp, A.
    PROCEEDINGS OF THE 21ST ACM SYMPOSIUM ON DOCUMENT ENGINEERING (DOCENG '21), 2021,
  • [45] De-Identification of Unstructured Textual Data using Artificial Immune System for Privacy Preserving
    Rahmani, Amine
    Amine, Abdelmalek
    Hamou, Reda Mohamed
    Boudia, Mohamed Amine
    Bouarara, Hadj Ahmed
    INTERNATIONAL JOURNAL OF DECISION SUPPORT SYSTEM TECHNOLOGY, 2016, 8 (04) : 34 - 49
  • [46] Electronic Health Record: A review
    Grana, Manuel
    Lackwoski, Konrad
    PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2015, : 1375 - 1382
  • [47] Development and evaluation of a de-identification procedure for a case register sourced from mental health electronic records
    Andrea C Fernandes
    Danielle Cloete
    Matthew TM Broadbent
    Richard D Hayes
    Chin-Kuo Chang
    Richard G Jackson
    Angus Roberts
    Jason Tsang
    Murat Soncul
    Jennifer Liebscher
    Robert Stewart
    Felicity Callard
    BMC Medical Informatics and Decision Making, 13
  • [48] Development and evaluation of a de-identification procedure for a case register sourced from mental health electronic records
    Fernandes, Andrea C.
    Cloete, Danielle
    Broadbent, Matthew T. M.
    Hayes, Richard D.
    Chang, Chin-Kuo
    Jackson, Richard G.
    Roberts, Angus
    Tsang, Jason
    Soncul, Murat
    Liebscher, Jennifer
    Stewart, Robert
    Callard, Felicity
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2013, 13
  • [49] Research and Implementation of the Algorithm for Data De-identification for Internet of Things
    Kaplun, Dmitriy I.
    Gnezdilov, Denis V.
    Efimenko, George A.
    Sinitca, Aleksandr M.
    Gulvanskiy, Vyacheslav V.
    2017 IEEE II INTERNATIONAL CONFERENCE ON CONTROL IN TECHNICAL SYSTEMS (CTS), 2017, : 363 - 366
  • [50] De-identification of Clinical Text for Secondary Use: Research Issues
    Berg, Hanna
    Henriksson, Aron
    Fors, Uno
    Dalianis, Hercules
    HEALTHINF: PROCEEDINGS OF THE 14TH INTERNATIONAL JOINT CONFERENCE ON BIOMEDICAL ENGINEERING SYSTEMS AND TECHNOLOGIES - VOL. 5: HEALTHINF, 2021, : 592 - 599