Recognizing software names in biomedical literature using machine learning

被引：7

作者：

Wei, Qiang ^{[1
]}

Zhang, Yaoyun ^{[1
]}

Amith, Muhammad ^{[1
]}

Lin, Rebecca ^{[2
]}

Lapeyrolerie, Jenay ^{[3
]}

Tao, Cui ^{[1
]}

Xu, Hua ^{[1
]}

机构：

[1] Univ Texas Hlth Sci Ctr Houston, Houston, TX 77030 USA

[2] Johns Hopkins Univ, Baltimore, MD 21218 USA

[3] Baylor Univ, Waco, TX 76798 USA

来源：

HEALTH INFORMATICS JOURNAL | 2020年 / 26卷 / 01期

关键词：

biomedical literature; biomedical software; biomedical software index; named entity recognition; natural language processing; BIOINFORMATICS; SERVICES;

D O I：

10.1177/1460458219869490

中图分类号：

R19 [保健组织与事业（卫生事业管理）];

学科分类号：

摘要：

Software tools now are essential to research and applications in the biomedical domain. However, existing software repositories are mainly built using manual curation, which is time-consuming and unscalable. This study took the initiative to manually annotate software names in 1,120 MEDLINE abstracts and titles and used this corpus to develop and evaluate machine learning-based named entity recognition systems for biomedical software. Specifically, two strategies were proposed for feature engineering: (1) domain knowledge features and (2) unsupervised word representation features of clustered and binarized word embeddings. Our best system achieved an F-measure of 91.79% for recognizing software from titles and an F-measure of 86.35% for recognizing software from both titles and abstracts using inexact matching criteria. We then created a biomedical software catalog with 19,557 entries using the developed system. This study demonstrates the feasibility of using natural language processing methods to automatically build a high-quality software index from biomedical literature.

引用

页码：21 / 33

页数：13

共 50 条

[1] Recognizing names in biomedical texts: a machine learning approach
Zhou, GD
Zhang, J
Su, J
Shen, D
Tan, CL
BIOINFORMATICS, 2004, 20 (07) : 1178 - 1190
[2] Machine Learning for Biomedical Literature Triage
Almeida, Hayda
Meurs, Marie-Jean
Kosseim, Leila
Butler, Greg
Tsang, Adrian
PLOS ONE, 2014, 9 (12):
[3] A machine learning approach for the curation of biomedical literature
Shi, M
Edwin, DS
Menon, R
Shen, LX
Lim, JYK
Loh, HT
Keerthi, SS
Ong, CJ
ADVANCES IN INFORMATION RETRIEVAL, 2003, 2633 : 597 - 604
[4] Recognizing Scientific Artifacts in Biomedical Literature
Groza, Tudor
Hassanzadeh, Hamed
Hunter, Jane
BIOMEDICAL INFORMATICS INSIGHTS, 2013, 6 : 15 - 27
[5] Drug Disease Relation Extraction from Biomedical Literature Using NLP and Machine Learning
Ben Abdessalem Karaa, Wahiba
Alkhammash, Eman H.
Bchir, Aida
MOBILE INFORMATION SYSTEMS, 2021, 2021
[6] A Systematic Literature Review on Automated Software Vulnerability Detection Using Machine Learning
Harzevili, Nima shiri
Belle, Alvine boaye
Wang, Junjie
Wang, Song
Jiang, Zhen ming
Nagappan, Nachiappan
ACM COMPUTING SURVEYS, 2025, 57 (03)
[7] A systematic literature review of software effort prediction using machine learning methods
Ali, Asad
Gravino, Carmine
JOURNAL OF SOFTWARE-EVOLUTION AND PROCESS, 2019, 31 (10)
[8] Systematic Literature Review on Software Effort Estimation Using Machine Learning Approaches
Sharma, Pinkashia
Singh, Jaiteg
2017 INTERNATIONAL CONFERENCE ON NEXT GENERATION COMPUTING AND INFORMATION SYSTEMS (ICNGCIS), 2017, : 43 - 47
[9] A Literature Review of Using Machine Learning in Software Development Life Cycle Stages
Shafiq, Saad
Mashkoor, Atif
Mayr-Dorn, Christoph
Egyed, Alexander
IEEE ACCESS, 2021, 9 : 140896 - 140920
[10] Recognizing names in biomedical texts using mutual information independence model and SVM plus sigmoid
Zhou, G. D.
INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2006, 75 (06) : 456 - 467

← 1 2 3 4 5 →