Recognizing software names in biomedical literature using machine learning

被引：7

作者：

Wei, Qiang ^{[1
]}

Zhang, Yaoyun ^{[1
]}

Amith, Muhammad ^{[1
]}

Lin, Rebecca ^{[2
]}

Lapeyrolerie, Jenay ^{[3
]}

Tao, Cui ^{[1
]}

Xu, Hua ^{[1
]}

机构：

[1] Univ Texas Hlth Sci Ctr Houston, Houston, TX 77030 USA

[2] Johns Hopkins Univ, Baltimore, MD 21218 USA

[3] Baylor Univ, Waco, TX 76798 USA

来源：

HEALTH INFORMATICS JOURNAL | 2020年 / 26卷 / 01期

关键词：

biomedical literature; biomedical software; biomedical software index; named entity recognition; natural language processing; BIOINFORMATICS; SERVICES;

D O I：

10.1177/1460458219869490

中图分类号：

R19 [保健组织与事业（卫生事业管理）];

学科分类号：

摘要：

Software tools now are essential to research and applications in the biomedical domain. However, existing software repositories are mainly built using manual curation, which is time-consuming and unscalable. This study took the initiative to manually annotate software names in 1,120 MEDLINE abstracts and titles and used this corpus to develop and evaluate machine learning-based named entity recognition systems for biomedical software. Specifically, two strategies were proposed for feature engineering: (1) domain knowledge features and (2) unsupervised word representation features of clustered and binarized word embeddings. Our best system achieved an F-measure of 91.79% for recognizing software from titles and an F-measure of 86.35% for recognizing software from both titles and abstracts using inexact matching criteria. We then created a biomedical software catalog with 19,557 entries using the developed system. This study demonstrates the feasibility of using natural language processing methods to automatically build a high-quality software index from biomedical literature.

引用

页码：21 / 33

页数：13

共 50 条

[31] Chinese Names in the Biomedical Literature: Suggested Bibliometric Standardization
Jaime A. Teixeira da Silva
Publishing Research Quarterly, 2020, 36 : 254 - 257
[32] Extract antibody and antigen names from biomedical literature
Dinh, Thuy Trang
Vo-Chanh, Trang Phuong
Nguyen, Chau
Huynh, Viet Quoc
Vo, Nam
Nguyen, Hoang Duc
BMC BIOINFORMATICS, 2022, 23 (01)
[33] Chinese Names in the Biomedical Literature: Suggested Bibliometric Standardization
Teixeira da Silva, Jaime A.
PUBLISHING RESEARCH QUARTERLY, 2020, 36 (02) : 254 - 257
[34] Extract antibody and antigen names from biomedical literature
Thuy Trang Dinh
Trang Phuong Vo-Chanh
Chau Nguyen
Viet Quoc Huynh
Nam Vo
Hoang Duc Nguyen
BMC Bioinformatics, 23
[35] Recognizing Speech Emotion Based on Acoustic Features Using Machine Learning
Nasim, Md Abu Saleh
Chowdory, Md Rakibul Hassan
Dey, Ashim
Das, Annesha
13TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER SCIENCE AND INFORMATION SYSTEMS (ICACSIS 2021), 2021, : 95 - +
[36] A Machine Learning approach for Recognizing Intellectual Development Disorder using EEG
Anwar, Talha
PROCEEDINGS OF THE 2020 INTERNATIONAL CONFERENCE ON BIOMEDICAL INNOVATIONS AND APPLICATIONS (BIA 2020), 2020, : 10 - 13
[37] Recognizing patterns of visual field loss using unsupervised machine learning
Yousefi, Siamak
Goldbaum, Michael H.
Zangwill, Linda M.
Medeiros, Felipe A.
Bowd, Christopher
MEDICAL IMAGING 2014: IMAGE PROCESSING, 2014, 9034
[38] Using content-based and bibliometric features for machine learning models to predict citation counts in the biomedical literature
Fu L.D.
Aliferis C.F.
Scientometrics, 2010, 85 (1) : 257 - 270
[39] Using content-based and bibliometric features for machine learning models to predict citation counts in the biomedical literature
Fu, Lawrence D.
Aliferis, Constantin F.
SCIENTOMETRICS, 2010, 85 (01) : 257 - 270
[40] Software Risk Prediction: Systematic Literature Review on Machine Learning Techniques
Mahmud, Mahmudul Hoque
Nayan, Md Tanzirul Haque
Ashir, Dewan Md Nur Anjum
Kabir, Md Alamgir
APPLIED SCIENCES-BASEL, 2022, 12 (22):

← 1 2 3 4 5 →