DeIDNER Corpus: Annotation of Clinical Discharge Summary Notes for Named Entity Recognition Using BRAT Tool

被引:4
|
作者
Syed, Mahanazuddin [1 ]
Al-Shukri, Shaymaa [1 ]
Syed, Shorabuddin [1 ]
Sexton, Kevin [1 ]
Greer, Melody L. [1 ]
Zozus, Meredith [2 ]
Bhattacharyya, Sudeepa [3 ]
Prior, Fred [1 ]
机构
[1] Univ Arkansas Med Sci, Dept Biomed Informat, Little Rock, AR USA
[2] Univ Texas Hlth Sci Ctr San Antonio, Dept Populat Hlth Sci, San Antonio, TX USA
[3] Arkansas State Univ, Dept Biol Sci & Arkansas Biosci Inst, Jonesboro, AR USA
基金
美国国家卫生研究院;
关键词
Named Entity Recognition; Annotation; De-identification; Clinical Corpus; Natural Language Processing; DE-IDENTIFICATION; AGREEMENT;
D O I
10.3233/SHTI210195
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Named Entity Recognition (NER) aims to identify and classify entities into predefined categories is a critical pre-processing task in Natural Language Processing (NLP) pipeline. Readily available off-the-shelf NER algorithms or programs are trained on a general corpus and often need to be retrained when applied on a different domain. The end model's performance depends on the quality of named entities generated by these NER models used in the NLP task. To improve NER model accuracy, researchers build domain-specific corpora for both model training and evaluation. However, in the clinical domain, there is a dearth of training data because of privacy reasons, forcing many studies to use NER models that are trained in the non-clinical domain to generate NER feature-set. Thus, influencing the performance of the downstream NLP tasks like information extraction and de-identification. In this paper, our objective is to create a high quality annotated clinical corpus for training NER models that can be easily generalizable and can be used in a downstream de-identification task to generate named entities feature-set.
引用
收藏
页码:432 / 436
页数:5
相关论文
共 44 条
  • [1] DeIDNER Model: A Neural Network Named Entity Recognition Model for Use in the De-identification of Clinical Notes
    Syed, Mahanazuddin
    Sexton, Kevin
    Greer, Melody
    Syed, Shorabuddin
    VanScoy, Joseph
    Kawsar, Farhan
    Olson, Erica
    Patel, Karan
    Erwin, Jake
    Bhattacharyya, Sudeepa
    Zozus, Meredith
    Prior, Fred
    [J]. HEALTHINF: PROCEEDINGS OF THE 15TH INTERNATIONAL JOINT CONFERENCE ON BIOMEDICAL ENGINEERING SYSTEMS AND TECHNOLOGIES - VOL 5: HEALTHINF, 2021, : 640 - 647
  • [2] System evaluation on a named entity corpus from clinical notes
    Kipper-Schuler, Karin
    Kaggal, Vinod
    Masanz, James
    Ogren, Philip
    Savova, Guergana
    [J]. SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 3007 - 3011
  • [3] The impact of using different annotation schemes on named entity recognition
    Alshammari, Nasser
    Alanazi, Saad
    [J]. EGYPTIAN INFORMATICS JOURNAL, 2021, 22 (03) : 295 - 302
  • [4] A French Corpus and Annotation Schema for Named Entity Recognition and Relation Extraction of Financial News
    Jabbari, Ali
    Sauvage, Olivier
    Zeine, Hamada
    Chergui, Hamza
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 2293 - 2299
  • [5] A Corpus Study and Annotation Schema for Named Entity Recognition and Relation Extraction of Business Products
    Schoen, Saskia
    Mironova, Veselina
    Gabryszak, Aleksandra
    Hennig, Leonhard
    [J]. PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 4445 - 4451
  • [6] An active learning-enabled annotation system for clinical named entity recognition
    Chen, Yukun
    Lask, Thomas A.
    Mei, Qiaozhu
    Chen, Qingxia
    Moon, Sungrim
    Wang, Jingqi
    Ky Nguyen
    Dawodu, Tolulola
    Cohen, Trevor
    Denny, Joshua C.
    Xu, Hua
    [J]. BMC MEDICAL INFORMATICS AND DECISION MAKING, 2017, 17
  • [7] An active learning-enabled annotation system for clinical named entity recognition
    Yukun Chen
    Thomas A. Lask
    Qiaozhu Mei
    Qingxia Chen
    Sungrim Moon
    Jingqi Wang
    Ky Nguyen
    Tolulola Dawodu
    Trevor Cohen
    Joshua C. Denny
    Hua Xu
    [J]. BMC Medical Informatics and Decision Making, 17
  • [8] A Method for Building a Labeled Named Entity Recognition Corpus Using Ontologies
    Ngoc-Trinh Vu
    Van-Hien Tran
    Thi-Huyen-Trang Doan
    Hoang-Quynh Le
    Mai-Vu Tran
    [J]. ADVANCED COMPUTATIONAL METHODS FOR KNOWLEDGE ENGINEERING, 2015, 358 : 141 - 149
  • [9] Wojood: Nested Arabic Named Entity Corpus and Recognition using BERT
    Jarrar, Mustafa
    Khalilia, Mohammed
    Ghanem, Sana
    [J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 3626 - 3636
  • [10] Using corpus-derived name lists for named entity recognition
    Stevenson, M
    Gaizauskas, R
    [J]. 6TH APPLIED NATURAL LANGUAGE PROCESSING CONFERENCE/1ST MEETING OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE AND PROCEEDINGS OF THE ANLP-NAACL 2000 STUDENT RESEARCH WORKSHOP, 2000, : 290 - 295