Recognizing software names in biomedical literature using machine learning

被引:7
|
作者
Wei, Qiang [1 ]
Zhang, Yaoyun [1 ]
Amith, Muhammad [1 ]
Lin, Rebecca [2 ]
Lapeyrolerie, Jenay [3 ]
Tao, Cui [1 ]
Xu, Hua [1 ]
机构
[1] Univ Texas Hlth Sci Ctr Houston, Houston, TX 77030 USA
[2] Johns Hopkins Univ, Baltimore, MD 21218 USA
[3] Baylor Univ, Waco, TX 76798 USA
关键词
biomedical literature; biomedical software; biomedical software index; named entity recognition; natural language processing; BIOINFORMATICS; SERVICES;
D O I
10.1177/1460458219869490
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Software tools now are essential to research and applications in the biomedical domain. However, existing software repositories are mainly built using manual curation, which is time-consuming and unscalable. This study took the initiative to manually annotate software names in 1,120 MEDLINE abstracts and titles and used this corpus to develop and evaluate machine learning-based named entity recognition systems for biomedical software. Specifically, two strategies were proposed for feature engineering: (1) domain knowledge features and (2) unsupervised word representation features of clustered and binarized word embeddings. Our best system achieved an F-measure of 91.79% for recognizing software from titles and an F-measure of 86.35% for recognizing software from both titles and abstracts using inexact matching criteria. We then created a biomedical software catalog with 19,557 entries using the developed system. This study demonstrates the feasibility of using natural language processing methods to automatically build a high-quality software index from biomedical literature.
引用
收藏
页码:21 / 33
页数:13
相关论文
共 50 条
  • [41] A Software Framework for Building Biomedical Machine Learning Classifiers through Grid Computing Resources
    Ramos-Pollan, Raul
    Guevara-Lopez, Miguel Angel
    Oliveira, Eugenio
    JOURNAL OF MEDICAL SYSTEMS, 2012, 36 (04) : 2245 - 2257
  • [42] Machine learning in biomedical engineering
    Park, Cheolsoo
    Took, Clive Cheong
    Seong, Joon-Kyung
    BIOMEDICAL ENGINEERING LETTERS, 2018, 8 (01) : 1 - 3
  • [43] Machine Learning for Biomedical Application
    Strzelecki, Michal
    Badura, Pawel
    APPLIED SCIENCES-BASEL, 2022, 12 (04):
  • [44] Machine learning in biomedical engineering
    Cheolsoo Park
    Clive Cheong Took
    Joon-Kyung Seong
    Biomedical Engineering Letters, 2018, 8 (1) : 1 - 3
  • [45] A systematic literature review of machine learning techniques for software maintainability prediction
    Alsolai, Hadeel
    Roper, Marc
    INFORMATION AND SOFTWARE TECHNOLOGY, 2020, 119
  • [46] A Software Framework for Building Biomedical Machine Learning Classifiers through Grid Computing Resources
    Raúl Ramos-Pollán
    Miguel Ángel Guevara-López
    Eugénio Oliveira
    Journal of Medical Systems, 2012, 36 : 2245 - 2257
  • [47] Software Requirements Engineering through Machine Learning Techniques: A Literature Review
    Guadalupe Gramajo, Maria
    Ballejos, Luciana
    Ale, Mariel
    2018 IEEE BIENNIAL CONGRESS OF ARGENTINA (ARGENCON), 2018,
  • [48] Machine Learning for Biomedical Applications
    Cesarelli, Giuseppe
    Ponsiglione, Alfonso Maria
    Sansone, Mario
    Amato, Francesco
    Donisi, Leandro
    Ricciardi, Carlo
    BIOENGINEERING-BASEL, 2024, 11 (08):
  • [49] Software Metrics for Fault Prediction Using Machine Learning Approaches A Literature Review with PROMISE Repository Dataset
    Meiliana
    Karim, Syaeful
    Warnars, Harco Leslie Hendric Spits
    Gaol, Ford Lumban
    Abdurachman, Edi
    Soewito, Benfano
    2017 IEEE INTERNATIONAL CONFERENCE ON CYBERNETICS AND COMPUTATIONAL INTELLIGENCE (CYBERNETICSCOM), 2017, : 19 - 23
  • [50] INFERENCE OF BIOMEDICAL DATA SETS USING BAYESIAN MACHINE LEARNING
    Sohail, Ayesha
    BIOMEDICAL ENGINEERING-APPLICATIONS BASIS COMMUNICATIONS, 2019, 31 (04):