Hybrid medical named entity recognition using document structure and surrounding context

被引:1
|
作者
Landolsi, Mohamed Yassine [1 ]
Romdhane, Lotfi Ben [1 ]
Hlaoua, Lobna [1 ]
机构
[1] Univ Sousse, MARS Res Lab, SDM Res Grp, ISITCom,LR17ES05, Hammam Sousse, Tunisia
来源
JOURNAL OF SUPERCOMPUTING | 2024年 / 80卷 / 04期
关键词
Medical text mining; Named entity recognition; Machine learning; Information extraction; Electronic medical records; Section identification;
D O I
10.1007/s11227-023-05647-9
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Nowadays, there is a huge amount of electronic medical documents created in natural language by medical specialists, containing useful information needed for several medical tasks. However, reading these documents to get some specific information is a too tiring task. Thus, extracting information automatically became an essential and a challenging task, especially Named Entity Recognition (NER). NER is crucial for extracting valuable information used in various medical tasks such as clinical decision support and drug safety surveillance. Capturing sufficient context is important for an efficient NER. In the literature, some important context information are not well exploited. Usually, a standard sequence segmentation is used, such as sentence segmentation, which may can't cover sufficient context. In this paper, we propose a supervised NER method, called MedSINE (Medical Section Identification to enhance the Named Entity tagging), which is based on sequence tagging task using Bidirectional Long Short-Term Memory neural network with Conditional Random Field (BiLSTM-CRF). For that, we exploit layout information to segment the text on chunk sequences and to extract the parent sections of each word as features to provide sufficient context. In addition, we have used a clinical Bidirectional Encoder Representations from Transformers (BERT) word embedding, Part of Speech (PoS), and entity surrounding sequence features. Experiments were conducted on a manually annotated dataset of real Summary of Product Characteristics (SmPC) medical documents in PDF format and on the Colorado Richly Annotated Full Text (CRAFT) corpus. Our model achieved an F1-measure of 89.49% and 73.52% in terms of strict matching evaluation using the SmPC and CRAFT datasets, respectively. The results show that employing the sequence of parent sections improves the F1-measure by 4.71% in terms of strict matching evaluation.
引用
收藏
页码:5011 / 5041
页数:31
相关论文
共 50 条
  • [41] Named Entity Recognition based on a Graph Structure
    Munoz, David
    Perez, Fernando
    Pinto, David
    COMPUTACION Y SISTEMAS, 2020, 24 (02): : 553 - 563
  • [42] Ontology Attention Layer for Medical Named Entity Recognition
    Zha, Yue
    Ke, Yuanzhi
    Hu, Xiao
    Xiong, Caiquan
    APPLIED SCIENCES-BASEL, 2024, 14 (01):
  • [43] ViMedNER: A Medical Named Entity Recognition Dataset for Vietnamese
    Duong, Pham Van
    Trinh, Tien-Dat
    Nguyen, Minh-Tien
    Vu, Huy-The
    Pham, Minh-Chuan
    Tuan, Tran Manh
    Son, Le Hoang
    EAI Endorsed Transactions on Industrial Networks and Intelligent Systems, 2024, 11 (04)
  • [44] Hybrid Named Entity Recognition - Application to Arabic Language
    Meselhi, Mohamed A.
    Bakr, Hitham M. Abo
    Ziedan, Ibrahim
    Shaalan, Khaled
    2014 9TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING & SYSTEMS (ICCES), 2014, : 80 - 85
  • [45] Named Entity Recognition in Unstructured Medical Text Documents
    Pearson, Cole
    Seliya, Naeem
    Dave, Rushit
    INTERNATIONAL CONFERENCE ON ELECTRICAL, COMPUTER AND ENERGY TECHNOLOGIES (ICECET 2021), 2021, : 412 - 417
  • [46] Advances in Named Entity Recognition in Electronic Medical Record
    Liu, Andong
    Peng, Lin
    Ye, Qing
    Du, Jianqiang
    Cheng, Chunlei
    Zha, Qinglin
    Computer Engineering and Applications, 2023, 59 (21) : 39 - 51
  • [47] A Hybrid Named Entity Recognition System for Aviation Text
    Bharathi, A.
    Ramdin, Robin
    Babu, Preeja
    Menon, Vijay Krishna
    Jayaramakrishnan, Chandrasekhar
    Lakshmikumar, Sudarsan
    EAI ENDORSED TRANSACTIONS ON SCALABLE INFORMATION SYSTEMS, 2024, 11 (01)
  • [48] Classification Ensemble to Improve Medical Named Entity Recognition
    Keretna, Sara
    Lim, Chee Peng
    Creighton, Doug
    Shaban, Khaled Bashir
    2014 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), 2014, : 2630 - 2636
  • [49] A Novel Hybrid Approach to Arabic Named Entity Recognition
    Meselhi, Mohamed A.
    Bakr, Hitham M. Abo
    Ziedan, Ibrahim
    Shaalan, Khaled
    MACHINE TRANSLATION, CWMT 2014, 2014, 493 : 93 - 103
  • [50] ChemSpot: a hybrid system for chemical named entity recognition
    Rocktaschel, Tim
    Weidlich, Michael
    Leser, Ulf
    BIOINFORMATICS, 2012, 28 (12) : 1633 - 1640