Recognizing software names in biomedical literature using machine learning

被引:7
|
作者
Wei, Qiang [1 ]
Zhang, Yaoyun [1 ]
Amith, Muhammad [1 ]
Lin, Rebecca [2 ]
Lapeyrolerie, Jenay [3 ]
Tao, Cui [1 ]
Xu, Hua [1 ]
机构
[1] Univ Texas Hlth Sci Ctr Houston, Houston, TX 77030 USA
[2] Johns Hopkins Univ, Baltimore, MD 21218 USA
[3] Baylor Univ, Waco, TX 76798 USA
关键词
biomedical literature; biomedical software; biomedical software index; named entity recognition; natural language processing; BIOINFORMATICS; SERVICES;
D O I
10.1177/1460458219869490
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Software tools now are essential to research and applications in the biomedical domain. However, existing software repositories are mainly built using manual curation, which is time-consuming and unscalable. This study took the initiative to manually annotate software names in 1,120 MEDLINE abstracts and titles and used this corpus to develop and evaluate machine learning-based named entity recognition systems for biomedical software. Specifically, two strategies were proposed for feature engineering: (1) domain knowledge features and (2) unsupervised word representation features of clustered and binarized word embeddings. Our best system achieved an F-measure of 91.79% for recognizing software from titles and an F-measure of 86.35% for recognizing software from both titles and abstracts using inexact matching criteria. We then created a biomedical software catalog with 19,557 entries using the developed system. This study demonstrates the feasibility of using natural language processing methods to automatically build a high-quality software index from biomedical literature.
引用
收藏
页码:21 / 33
页数:13
相关论文
共 50 条
  • [1] Recognizing names in biomedical texts: a machine learning approach
    Zhou, GD
    Zhang, J
    Su, J
    Shen, D
    Tan, CL
    BIOINFORMATICS, 2004, 20 (07) : 1178 - 1190
  • [2] Machine Learning for Biomedical Literature Triage
    Almeida, Hayda
    Meurs, Marie-Jean
    Kosseim, Leila
    Butler, Greg
    Tsang, Adrian
    PLOS ONE, 2014, 9 (12):
  • [3] A machine learning approach for the curation of biomedical literature
    Shi, M
    Edwin, DS
    Menon, R
    Shen, LX
    Lim, JYK
    Loh, HT
    Keerthi, SS
    Ong, CJ
    ADVANCES IN INFORMATION RETRIEVAL, 2003, 2633 : 597 - 604
  • [4] Recognizing Scientific Artifacts in Biomedical Literature
    Groza, Tudor
    Hassanzadeh, Hamed
    Hunter, Jane
    BIOMEDICAL INFORMATICS INSIGHTS, 2013, 6 : 15 - 27
  • [5] Drug Disease Relation Extraction from Biomedical Literature Using NLP and Machine Learning
    Ben Abdessalem Karaa, Wahiba
    Alkhammash, Eman H.
    Bchir, Aida
    MOBILE INFORMATION SYSTEMS, 2021, 2021
  • [6] A Systematic Literature Review on Automated Software Vulnerability Detection Using Machine Learning
    Harzevili, Nima shiri
    Belle, Alvine boaye
    Wang, Junjie
    Wang, Song
    Jiang, Zhen ming
    Nagappan, Nachiappan
    ACM COMPUTING SURVEYS, 2025, 57 (03)
  • [7] A systematic literature review of software effort prediction using machine learning methods
    Ali, Asad
    Gravino, Carmine
    JOURNAL OF SOFTWARE-EVOLUTION AND PROCESS, 2019, 31 (10)
  • [8] Systematic Literature Review on Software Effort Estimation Using Machine Learning Approaches
    Sharma, Pinkashia
    Singh, Jaiteg
    2017 INTERNATIONAL CONFERENCE ON NEXT GENERATION COMPUTING AND INFORMATION SYSTEMS (ICNGCIS), 2017, : 43 - 47
  • [9] A Literature Review of Using Machine Learning in Software Development Life Cycle Stages
    Shafiq, Saad
    Mashkoor, Atif
    Mayr-Dorn, Christoph
    Egyed, Alexander
    IEEE ACCESS, 2021, 9 : 140896 - 140920
  • [10] Recognizing names in biomedical texts using mutual information independence model and SVM plus sigmoid
    Zhou, G. D.
    INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2006, 75 (06) : 456 - 467