Recognizing software names in biomedical literature using machine learning

被引:7
|
作者
Wei, Qiang [1 ]
Zhang, Yaoyun [1 ]
Amith, Muhammad [1 ]
Lin, Rebecca [2 ]
Lapeyrolerie, Jenay [3 ]
Tao, Cui [1 ]
Xu, Hua [1 ]
机构
[1] Univ Texas Hlth Sci Ctr Houston, Houston, TX 77030 USA
[2] Johns Hopkins Univ, Baltimore, MD 21218 USA
[3] Baylor Univ, Waco, TX 76798 USA
关键词
biomedical literature; biomedical software; biomedical software index; named entity recognition; natural language processing; BIOINFORMATICS; SERVICES;
D O I
10.1177/1460458219869490
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Software tools now are essential to research and applications in the biomedical domain. However, existing software repositories are mainly built using manual curation, which is time-consuming and unscalable. This study took the initiative to manually annotate software names in 1,120 MEDLINE abstracts and titles and used this corpus to develop and evaluate machine learning-based named entity recognition systems for biomedical software. Specifically, two strategies were proposed for feature engineering: (1) domain knowledge features and (2) unsupervised word representation features of clustered and binarized word embeddings. Our best system achieved an F-measure of 91.79% for recognizing software from titles and an F-measure of 86.35% for recognizing software from both titles and abstracts using inexact matching criteria. We then created a biomedical software catalog with 19,557 entries using the developed system. This study demonstrates the feasibility of using natural language processing methods to automatically build a high-quality software index from biomedical literature.
引用
收藏
页码:21 / 33
页数:13
相关论文
共 50 条
  • [31] Chinese Names in the Biomedical Literature: Suggested Bibliometric Standardization
    Jaime A. Teixeira da Silva
    Publishing Research Quarterly, 2020, 36 : 254 - 257
  • [32] Extract antibody and antigen names from biomedical literature
    Dinh, Thuy Trang
    Vo-Chanh, Trang Phuong
    Nguyen, Chau
    Huynh, Viet Quoc
    Vo, Nam
    Nguyen, Hoang Duc
    BMC BIOINFORMATICS, 2022, 23 (01)
  • [33] Chinese Names in the Biomedical Literature: Suggested Bibliometric Standardization
    Teixeira da Silva, Jaime A.
    PUBLISHING RESEARCH QUARTERLY, 2020, 36 (02) : 254 - 257
  • [34] Extract antibody and antigen names from biomedical literature
    Thuy Trang Dinh
    Trang Phuong Vo-Chanh
    Chau Nguyen
    Viet Quoc Huynh
    Nam Vo
    Hoang Duc Nguyen
    BMC Bioinformatics, 23
  • [35] Recognizing Speech Emotion Based on Acoustic Features Using Machine Learning
    Nasim, Md Abu Saleh
    Chowdory, Md Rakibul Hassan
    Dey, Ashim
    Das, Annesha
    13TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER SCIENCE AND INFORMATION SYSTEMS (ICACSIS 2021), 2021, : 95 - +
  • [36] A Machine Learning approach for Recognizing Intellectual Development Disorder using EEG
    Anwar, Talha
    PROCEEDINGS OF THE 2020 INTERNATIONAL CONFERENCE ON BIOMEDICAL INNOVATIONS AND APPLICATIONS (BIA 2020), 2020, : 10 - 13
  • [37] Recognizing patterns of visual field loss using unsupervised machine learning
    Yousefi, Siamak
    Goldbaum, Michael H.
    Zangwill, Linda M.
    Medeiros, Felipe A.
    Bowd, Christopher
    MEDICAL IMAGING 2014: IMAGE PROCESSING, 2014, 9034
  • [38] Using content-based and bibliometric features for machine learning models to predict citation counts in the biomedical literature
    Fu L.D.
    Aliferis C.F.
    Scientometrics, 2010, 85 (1) : 257 - 270
  • [39] Using content-based and bibliometric features for machine learning models to predict citation counts in the biomedical literature
    Fu, Lawrence D.
    Aliferis, Constantin F.
    SCIENTOMETRICS, 2010, 85 (01) : 257 - 270
  • [40] Software Risk Prediction: Systematic Literature Review on Machine Learning Techniques
    Mahmud, Mahmudul Hoque
    Nayan, Md Tanzirul Haque
    Ashir, Dewan Md Nur Anjum
    Kabir, Md Alamgir
    APPLIED SCIENCES-BASEL, 2022, 12 (22):