Identifying named entities from PubMed® for enriching semantic categories

被引:6
|
作者
Kim, Sun [1 ]
Lu, Zhiyong [1 ]
Wilbur, John [1 ]
机构
[1] Natl Lib Med, Natl Ctr Biotechnol Informat, NIH, Bethesda, MD 20894 USA
来源
BMC BIOINFORMATICS | 2015年 / 16卷
关键词
Semantic term extraction; Natural language processing; Machine learning; CONCEPT EXTRACTION; BIOMEDICAL TEXT; RECOGNITION;
D O I
10.1186/s12859-015-0487-2
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Controlled vocabularies such as the Unified Medical Language System (UMLS (R)) and Medical Subject Headings (MeSH (R)) are widely used for biomedical natural language processing (NLP) tasks. However, the standard terminology in such collections suffers from low usage in biomedical literature, e.g. only 13% of UMLS terms appear in MEDLINE (R). Results: We here propose an efficient and effective method for extracting noun phrases for biomedical semantic categories. The proposed approach utilizes simple linguistic patterns to select candidate noun phrases based on headwords, and a machine learning classifier is used to filter out noisy phrases. For experiments, three NLP rules were tested and manually evaluated by three annotators. Our approaches showed over 93% precision on average for the headwords, "gene", "protein", "disease", "cell" and "cells". Conclusions: Although biomedical terms in knowledge-rich resources may define semantic categories, variations of the controlled terms in literature are still difficult to identify. The method proposed here is an effort to narrow the gap between controlled vocabularies and the entities used in text. Our extraction method cannot completely eliminate manual evaluation, however a simple and automated solution with high precision performance provides a convenient way for enriching semantic categories by incorporating terms obtained from the literature.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] Ontology-Based Query Expansion with Latently Related Named Entities for Semantic Text Search
    Ngo, Vuong M.
    Cao, Tru H.
    [J]. ADVANCES IN INTELLIGENT INFORMATION AND DATABASE SYSTEMS, 2010, 283 : 41 - 52
  • [42] Extraction of Semantic Relation Between Arabic Named Entities Using Different Kinds of Transducer Cascades
    Ben Mesmia, Fatma
    Bouabidi, Kaouther
    Haddar, Kais
    Friburger, Nathalie
    Maurel, Denis
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING (CICLING 2017), PT I, 2018, 10761 : 475 - 487
  • [43] Using Conditional Probability for Discovering Semantic Relationships between Named Entities in Cultural Heritage Data
    Stoikov, Jordan
    [J]. DIGITAL PRESENTATION AND PRESERVATION OF CULTURAL AND SCIENTIFIC HERITAGE, 2021, 11 : 77 - 87
  • [44] Label propagation via bootstrapped support vectors for semantic relation extraction between named entities
    GuoDong, Zhou
    LongHua, Qian
    QiaoMing, Zhu
    [J]. COMPUTER SPEECH AND LANGUAGE, 2009, 23 (04): : 464 - 478
  • [45] Extracting Named Entities from Prophetic Narration Texts (Hadith)
    Harrag, Fouzi
    El-Qawasmeh, Eyas
    Al-Salman, Abdul Malik Salman
    [J]. SOFTWARE ENGINEERING AND COMPUTER SYSTEMS, PT 2, 2011, 180 : 289 - +
  • [46] Extraction of named entities from tables in gene mutation literature
    NICTA Victoria Research Laboratory, Australia
    不详
    [J]. ADCS - Proc. Thirteenth Australasian Doc. Comput. Symp., 2008, (49-52):
  • [47] Mining Named Entities from Search Engine Query Logs
    Alasiry, Areej
    Levene, Mark
    Poulovassilis, Alexandra
    [J]. PROCEEDINGS OF THE 18TH INTERNATIONAL DATABASE ENGINEERING AND APPLICATIONS SYMPOSIUM (IDEAS14), 2014, : 46 - 56
  • [48] FROM LINGUISTICS TO ONTOLOGIES The Role of Named Entities in the Conceptualisation Process
    Omrane, Nouha
    Nazarenko, Adeline
    Szulman, Sylvie
    [J]. KEOD 2011: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON KNOWLEDGE ENGINEERING AND ONTOLOGY DEVELOPMENT, 2011, : 249 - 254
  • [49] Investigation of features for extraction of named entities from texts in Russian
    V. A. Mozharova
    N. V. Lukashevich
    [J]. Automatic Documentation and Mathematical Linguistics, 2017, 51 (3) : 127 - 134
  • [50] ArabiaNer: A System to Extract Named Entities from Arabic Content
    Hudhud, Mohammad
    Abdelhaq, Hamed
    Mohsen, Fadi
    [J]. ICAART: PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE - VOL 1, 2021, : 489 - 497