Identifying named entities from PubMed® for enriching semantic categories

被引:6
|
作者
Kim, Sun [1 ]
Lu, Zhiyong [1 ]
Wilbur, John [1 ]
机构
[1] Natl Lib Med, Natl Ctr Biotechnol Informat, NIH, Bethesda, MD 20894 USA
来源
BMC BIOINFORMATICS | 2015年 / 16卷
关键词
Semantic term extraction; Natural language processing; Machine learning; CONCEPT EXTRACTION; BIOMEDICAL TEXT; RECOGNITION;
D O I
10.1186/s12859-015-0487-2
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Controlled vocabularies such as the Unified Medical Language System (UMLS (R)) and Medical Subject Headings (MeSH (R)) are widely used for biomedical natural language processing (NLP) tasks. However, the standard terminology in such collections suffers from low usage in biomedical literature, e.g. only 13% of UMLS terms appear in MEDLINE (R). Results: We here propose an efficient and effective method for extracting noun phrases for biomedical semantic categories. The proposed approach utilizes simple linguistic patterns to select candidate noun phrases based on headwords, and a machine learning classifier is used to filter out noisy phrases. For experiments, three NLP rules were tested and manually evaluated by three annotators. Our approaches showed over 93% precision on average for the headwords, "gene", "protein", "disease", "cell" and "cells". Conclusions: Although biomedical terms in knowledge-rich resources may define semantic categories, variations of the controlled terms in literature are still difficult to identify. The method proposed here is an effort to narrow the gap between controlled vocabularies and the entities used in text. Our extraction method cannot completely eliminate manual evaluation, however a simple and automated solution with high precision performance provides a convenient way for enriching semantic categories by incorporating terms obtained from the literature.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Identifying named entities from PubMed® for enriching semantic categories
    Sun Kim
    Zhiyong Lu
    W John Wilbur
    [J]. BMC Bioinformatics, 16
  • [2] Identifying semantic relations between named entities from Chinese texts
    Yao, Tianfang
    Uszkoreit, Hans
    [J]. Cognitive Systems, 2007, 4429 : 70 - 83
  • [3] Identifying Crop Specific Named Entities from Agriculture Domain Using Semantic Vector
    Kumar, Ashish
    Biswas, Payal
    Sharan, Aditi
    [J]. INFORMATION SYSTEMS DESIGN AND INTELLIGENT APPLICATIONS, VOL 2, INDIA 2016, 2016, 434 : 595 - 603
  • [4] Identifying Named Entities as they are Typed
    Arora, Ravneet Singh
    Tsai, Chen-Tse
    Preotiuc-Pietro, Daniel
    [J]. 16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 976 - 988
  • [5] Enriching Media Fragments with Named Entities for Video Classification
    Li, Yunjia
    Rizzo, Giuseppe
    Redondo, Jose Luis
    Troncy, Raphael
    Wald, Mike
    Wills, Gary
    [J]. PROCEEDINGS OF THE 22ND INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW'13 COMPANION), 2013, : 469 - 476
  • [6] Disambiguating named entities by semantic web
    Azari, Ideh
    Koohpeyma, Fateme
    [J]. PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND SERVICE SYSTEM (CSSS), 2014, 109 : 741 - 744
  • [7] Boosting a Semantic Search Engine by Named Entities
    Caputo, Annalina
    Basile, Pierpaolo
    Semeraro, Giovanni
    [J]. FOUNDATIONS OF INTELLIGENT SYSTEMS, PROCEEDINGS, 2009, 5722 : 241 - 250
  • [8] Automatic Semantic Web Annotation of Named Entities
    Charton, Eric
    Gagnon, Michel
    Ozell, Benoit
    [J]. ADVANCES IN ARTIFICIAL INTELLIGENCE, 2011, 6657 : 74 - 85
  • [9] Semantic Clustering of Relations between Named Entities
    Wang, Wei
    Besancon, Romaric
    Ferret, Olivier
    Grau, Brigitte
    [J]. ADVANCES IN NATURAL LANGUAGE PROCESSING, 2014, 8686 : 358 - +
  • [10] Identifying Medical Named Entities with Word Information
    Ben, Yanyan
    Pang, Xueqin
    [J]. Data Analysis and Knowledge Discovery, 2023, 7 (05) : 123 - 132