Data Masking for Chinese Electronic Medical Records with Named Entity Recognition

被引:1
|
作者
He, Tianyu [1 ]
Xu, Xiaolong [1 ]
Hu, Zhichen [1 ]
Zhao, Qingzhan [2 ]
Dai, Jianguo [2 ]
Dai, Fei [3 ]
机构
[1] Nanjing Univ Informat Sci & Technol, Sch Comp Sci, Nanjing 21000, Peoples R China
[2] Geospatial Informat Engn Res Ctr, Xinjiang Prod & Construct Corps, Shihezi 832003, Peoples R China
[3] Southwest Forestry Univ, Coll Big Data & Intelligent Engn, Kunming 650224, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
Named entity recognition; Chinese electronic medical records; data masking; principal component analysis; regular expression;
D O I
10.32604/iasc.2023.036831
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the rapid development of information technology, the electronifi- cation of medical records has gradually become a trend. In China, the population base is huge and the supporting medical institutions are numerous, so this reality drives the conversion of paper medical records to electronic medical records. Electronic medical records are the basis for establishing a smart hospital and an important guarantee for achieving medical intelligence, and the massive amount of electronic medical record data is also an important data set for conducting research in the medical field. However, electronic medical records contain a large amount of private patient information, which must be desensitized before they are used as open resources. Therefore, to solve the above problems, data masking for Chinese electronic medical records with named entity recognition is proposed in this paper. Firstly, the text is vectorized to satisfy the required format of the model input. Secondly, since the input sentences may have a long or short length and the relationship between sentences in context is not negligible. To this end, a neural network model for named entity recognition based on bidirectional long short-term memory (BiLSTM) with conditional random fields (CRF) is constructed. Finally, the data masking operation is performed based on the named entity recog-nition results, mainly using regular expression filtering encryption and principal component analysis (PCA) word vector compression and replacement. In addi-tion, comparison experiments with the hidden markov model (HMM) model, LSTM-CRF model, and BiLSTM model are conducted in this paper. The experi-mental results show that the method used in this paper achieves 92.72% Accuracy, 92.30% Recall, and 92.51% F1_score, which has higher accuracy compared with other models.
引用
收藏
页码:3657 / 3673
页数:17
相关论文
共 50 条
  • [1] Named Entity Recognition and Event Extraction in Chinese Electronic Medical Records
    Ma, Cheng
    Huang, Wenkang
    [J]. CCKS 2021 - EVALUATION TRACK, 2022, 1553 : 133 - 138
  • [2] Named Entity Recognition in Chinese Electronic Medical Records Based on CRF
    Liu, Kaixin
    Hu, Qingcheng
    Liu, Jianwei
    Xing, Chunxiao
    [J]. 2017 14TH WEB INFORMATION SYSTEMS AND APPLICATIONS CONFERENCE (WISA 2017), 2017, : 105 - 110
  • [3] A Hybrid Model for Named Entity Recognition on Chinese Electronic Medical Records
    Wang, Yu
    Sun, Yining
    Ma, Zuchang
    Gao, Lisheng
    Xu, Yang
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2021, 20 (02)
  • [4] A weakly supervised method for named entity recognition of Chinese electronic medical records
    Meng Li
    Chunrong Gao
    Kuang Zhang
    Huajian Zhou
    Jing Ying
    [J]. Medical & Biological Engineering & Computing, 2023, 61 : 2733 - 2743
  • [5] Combined Attention Mechanism for Named Entity Recognition in Chinese Electronic Medical Records
    Li, Luqi
    Hou, Li
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI), 2019, : 476 - 477
  • [6] A weakly supervised method for named entity recognition of Chinese electronic medical records
    Li, Meng
    Gao, Chunrong
    Zhang, Kuang
    Zhou, Huajian
    Ying, Jing
    [J]. MEDICAL & BIOLOGICAL ENGINEERING & COMPUTING, 2023, 61 (10) : 2733 - 2743
  • [7] Named Entity Recognition for Chinese Electronic Medical Records Based on Multitask and Transfer Learning
    Guo, Wenming
    Lu, Junda
    Han, Fang
    [J]. IEEE ACCESS, 2022, 10 : 77375 - 77382
  • [8] Named entity recognition of Chinese electronic medical records based on multifeature embedding and attention mechanism
    Gong, Dun-Wei
    Zhang, Yong-Kai
    Guo, Yi-Nan
    Wang, Bin
    Fan, Kuan-Lu
    Huo, Yan
    [J]. Gongcheng Kexue Xuebao/Chinese Journal of Engineering, 2021, 43 (09): : 1190 - 1196
  • [9] Overview of CCKS 2018 Task 1: Named Entity Recognition in Chinese Electronic Medical Records
    Zhang, Jiangtao
    Li, Juanzi
    Jiao, Zengtao
    Yan, Jun
    [J]. KNOWLEDGE GRAPH AND SEMANTIC COMPUTING: KNOWLEDGE COMPUTING AND LANGUAGE UNDERSTANDING, 2019, 1134 : 158 - 164
  • [10] Named Entity Recognition of Chinese Electronic Medical Records Based on Cascaded Conditional Random Field
    Chen, Xiaoyu
    Shi, Shenghui
    Zhan, Siyan
    Jiang, Daguang
    Lin, Xiaoyong
    [J]. 2019 4TH IEEE INTERNATIONAL CONFERENCE ON BIG DATA ANALYTICS (ICBDA 2019), 2019, : 364 - 368