Hybrid medical named entity recognition using document structure and surrounding context

被引:1
|
作者
Landolsi, Mohamed Yassine [1 ]
Romdhane, Lotfi Ben [1 ]
Hlaoua, Lobna [1 ]
机构
[1] Univ Sousse, MARS Res Lab, SDM Res Grp, ISITCom,LR17ES05, Hammam Sousse, Tunisia
来源
JOURNAL OF SUPERCOMPUTING | 2024年 / 80卷 / 04期
关键词
Medical text mining; Named entity recognition; Machine learning; Information extraction; Electronic medical records; Section identification;
D O I
10.1007/s11227-023-05647-9
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Nowadays, there is a huge amount of electronic medical documents created in natural language by medical specialists, containing useful information needed for several medical tasks. However, reading these documents to get some specific information is a too tiring task. Thus, extracting information automatically became an essential and a challenging task, especially Named Entity Recognition (NER). NER is crucial for extracting valuable information used in various medical tasks such as clinical decision support and drug safety surveillance. Capturing sufficient context is important for an efficient NER. In the literature, some important context information are not well exploited. Usually, a standard sequence segmentation is used, such as sentence segmentation, which may can't cover sufficient context. In this paper, we propose a supervised NER method, called MedSINE (Medical Section Identification to enhance the Named Entity tagging), which is based on sequence tagging task using Bidirectional Long Short-Term Memory neural network with Conditional Random Field (BiLSTM-CRF). For that, we exploit layout information to segment the text on chunk sequences and to extract the parent sections of each word as features to provide sufficient context. In addition, we have used a clinical Bidirectional Encoder Representations from Transformers (BERT) word embedding, Part of Speech (PoS), and entity surrounding sequence features. Experiments were conducted on a manually annotated dataset of real Summary of Product Characteristics (SmPC) medical documents in PDF format and on the Colorado Richly Annotated Full Text (CRAFT) corpus. Our model achieved an F1-measure of 89.49% and 73.52% in terms of strict matching evaluation using the SmPC and CRAFT datasets, respectively. The results show that employing the sequence of parent sections improves the F1-measure by 4.71% in terms of strict matching evaluation.
引用
收藏
页码:5011 / 5041
页数:31
相关论文
共 50 条
  • [1] Hybrid medical named entity recognition using document structure and surrounding context
    Mohamed Yassine Landolsi
    Lotfi Ben Romdhane
    Lobna Hlaoua
    The Journal of Supercomputing, 2024, 80 : 5011 - 5041
  • [2] A Hybrid Model for Named Entity Recognition Using Unstructured Medical Text
    Keretna, Sara
    Lim, Chee Peng
    Creighton, Doug
    PROCEEDINGS OF THE 2014 9TH INTERNATIONAL CONFERENCE ON SYSTEM OF SYSTEMS ENGINEERING (SOSE 2014), 2014, : 85 - 90
  • [3] Document Theme Extraction Using Named-Entity Recognition
    Nagrale, Deepali
    Khatavkar, Vaibhav
    Kulkarni, Parag
    COMPUTING, COMMUNICATION AND SIGNAL PROCESSING, ICCASP 2018, 2019, 810 : 499 - 509
  • [4] Using Search Session Context for Named Entity Recognition in Query
    Du, Junwu
    Zhang, Zhimin
    Yan, Jun
    Cui, Yan
    Chen, Zheng
    SIGIR 2010: PROCEEDINGS OF THE 33RD ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH DEVELOPMENT IN INFORMATION RETRIEVAL, 2010, : 765 - 766
  • [5] A hybrid approach for named entity recognition in Chinese electronic medical record
    Bin Ji
    Rui Liu
    Shasha Li
    Jie Yu
    Qingbo Wu
    Yusong Tan
    Jiaju Wu
    BMC Medical Informatics and Decision Making, 19
  • [6] A hybrid approach for named entity recognition in Chinese electronic medical record
    Ji, Bin
    Liu, Rui
    Li, Shasha
    Yu, Jie
    Wu, Qingbo
    Tan, Yusong
    Wu, Jiaju
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2019, 19 (Suppl 2)
  • [7] Automatic Text Summarization using Document Clustering Named Entity Recognition
    Selvan, R. . Senthamizh
    Arutchelvan, K.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (09) : 537 - 543
  • [8] A Hybrid Model for Named Entity Recognition on Chinese Electronic Medical Records
    Wang, Yu
    Sun, Yining
    Ma, Zuchang
    Gao, Lisheng
    Xu, Yang
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2021, 20 (02)
  • [9] Chinese Medical Named Entity Recognition Using External Knowledge
    Zhang, Lin
    Lai, Peichao
    Ye, Feiyang
    Fang, Ruixiong
    Wang, Ruiqing
    Li, Jiayong
    Wang, Yilei
    PRICAI 2022: TRENDS IN ARTIFICIAL INTELLIGENCE, PT II, 2022, 13630 : 359 - 371
  • [10] Medical Named Entity Recognition Using Weakly Supervised Learning
    Long-Long Ma
    Jie Yang
    Bo An
    Shuaikang Liu
    Gaijuan Huang
    Cognitive Computation, 2022, 14 : 1068 - 1079